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Abstract 


The following dissertation presents the study that I performed at the Chem- 
istry Department of the University of Florence and at the European Laboratory 
for Non-Linear Spectroscopy (LENS) to recover and elucidate structural, dynam- 
ical, and spectroscopic molecular features adopting computer simulations. In 
particular, here ab initio molecular dynamics simulations and time-frequency 
analysis are the most employed “tools”, in order to have a better understand- 
ing of the origins of vibrational features. Hydrogen-bonding is the main inter- 
molecular interaction that can affect the vibrational spectra, and occurs in all the 
systems studied here. It is shown how the hydrogen-bonding usually induces 
a redshift on the vibrational frequencies of the groups engaged in it, and how 
this redshift nicely correlates with the strength of this type of bond, as obtained 
from (so-called) “first principles simulations”. While the redshift induced by the 
hydrogen-bond is the most common occurrence, I have also studied a system 
where hydrogen-bonding induces a blueshift. Recovering and explaining this 
somewhat unusual effect, well established experimentally, required particular 
computational efforts. Hydrogen-bonding can manifest itself also as a bifurcated 
interaction between one donor and two acceptor centres. This bifurcated config- 
uration is usually seen just as a very brief intermediate step occurring in water 
during breaking and creation of “true” hydrogen-bonds, but, in confined water, 
it has longer lifetimes, thus allowing it to be studied by both spectroscopic and 
computational means. The computational protocols implemented and adopted 
in this dissertation allow a direct comparison between structural features and 
vibrational spectrum, highlighting how the formers influence the latter. 
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Introduction: Vibrations, H-Bonding, and Structure 


Vibrations occurring in molecules are a powerful tool to investigate other 
relevant properties. The time scale of the molecular vibrations ranges from 
tens of femtoseconds to picoseconds, allowing to probe the dynamics and 
the structural features of both simple and complex systems. As a matter 
of fact, even small changes of relative conformations and environment 
interactions can have deep consequences in the vibrational spectra. 

In particular, molecular vibrations are affected by the presence of the 
hydrogen-bond [Jef]. The hydrogen-bond is a an inter-molecular interac- 
tion of type A-H --- B, where A and B contains atoms with high elec- 
tronegativity, ? such as nitrogen and oxygen. The hydrogen atom H is 
covalently bound to the donor atom of A, the other atom from B acts as 
the acceptor: these concepts are briefly illustrated in fig. 1. 

The hydrogen-bond is partially electrostatic in origin. In fact, high par- 
tial charges often lead to large static dipole moments, and dipole-dipole 
interactions are one of the main contributions of this inter-molecular bond. 
However, the hydrogen-bond also exhibits a covalent character [ISP*99], 
and, due to this fact, it is highly directional and can modulate the acceptor 
and donor receptivity of forming more H-bonds, allowing the creation of 
H-bond networks. 

Hydrogen-bonding is the “strongest” inter-molecular interaction due 
to its (partial) covalent character; typical binding energy of hydrogen and 
other inter-molecular bonds are reported in table 1. Due to its high bind- 
ing energy, the hydrogen-bond is the paramount inter-molecular inter- 
action of the systems where it is present, and has deep implications on 
the thermodynamic and dynamical properties. One of the most recurrent 
examples is liquid water, whose peculiar properties (which put liquid wa- 


2The electronegative atom of A is covalently bonded to H, whereas the electronegative 
atom of B is a first neighbour of H. The usual distances of an hydrogen-bond are in the 
range 1.7 — 2.5 Ä. 
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Figure 1: Three water molecules engaged in hydrogen-bonds. The lowest 
molecule acts as hydrogen-bond donor to that placed in the middle, which there- 
fore is an acceptor. Moreover, the middle molecule also acts as donor towards 
the highest one. 


Table 1: Binding energy of selected molecular and inter-molecular interactions. 
Values are merely indicative of the relative order of magnitude, and are taken 
from Ref. [Ege]. 
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ter into a completely different league compared to other liquids) are due 
to its network of hydrogen-bonds. Hydrogen-bonding is also essential 
to elucidate the behaviour of large bio-molecules, e.g., proteins and ri- 
bonucleic acids. The mechanism beneath protein folding and unfolding is 
driven by breaking and making of hydrogen-bonds, and base pairing and 
base recognition mechanism in DNA is also due to hydrogen-bonding cre- 
ation. In fact, it is exactly the intermediate “character” of this type of bond 
(strong compared to other inter-molecular interactions, weak compared to 
intra-molecular bonds), that makes it so versatile: it is strong enough to 
stabilise double helix structures that store the genetic code in DNA, but 
at the same time it is also weak enough to allow opening of this double 
helix during cell division. 

Thus, the hydrogen-bond effects comprise wide and different fields 
ranging from biology to matter physics, and a deep understanding of 
these effects is required to elucidate complex processes at the molecular 
level. 

The relative strength, lifetime and stability of the hydrogen-bond mod- 
ulate the molecular vibrations of both acceptor and donor groups, and 
are the main causes of frequency shift and broadening of the vibrational 
bands. 


Due to this relationship, the effect of the hydrogen-bond upon the 
vibrational spectrum can also be exploited to study the behaviour of the 
molecular system. Lifetimes of the hydrogen-bond provide information 
about the molecular dynamics, and structural changes can be inferred 
from the shifts and the broadening of the peaks. 

From an experimental point of view, infrared and Raman spectroscopy 
are the preferred techniques to approach molecular vibrations. These 
spectroscopies have benefited in the last decades of great progresses with 
the development of multi-dimensional techniques that allow more precise 
and comprehensive data, e.g. the possibility to distinguish homogeneous 
and inhomogeneous broadening. 

Nevertheless, the experimental techniques are not the only possible 
approaches to this topic. Quantum chemistry (“ab initio”) calculations 
nowadays are able to optimise the geometrical structure of molecules and 
clusters, and to calculate reliable vibrational density of states (along with 
infrared and Raman spectra): thus, straightforward correlations between 
structural and spectroscopic features can be analysed. 


Classical molecular dynamics allows obtaining trajectories whose sta- 
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tistical relevance can elucidate the stability of different conformations. 
Furthermore, structural changes can be monitored to predict or under- 
stand the working of active sites and other inter-molecular phenomena. 

Ab initio molecular dynamics, on the other hand, takes the best of both 
worlds: it allows obtaining both structural and dynamic properties, and 
also spectroscopic features, via the linear response theory.” Within this 
computational framework, the electronic (time independent) problem is 
solved and quantum forces are used to propagate the dynamics of classical 
nuclei. 

Unfortunately, this scheme also requires huge resources, as both com- 
putational power and memory space, so practical calculations are lim- 
ited in the size of the molecular systems. If classical molecular dynamics 
simulations of tens of thousands atoms for nanoseconds are nowadays 
routine, ab initio molecular dynamics simulations are actually limited to 
hundreds or (at most!) a thousands atoms for (at most!) hundreds pi- 
coseconds. These limitations basically rule out this approach as a mean 
to study the slow dynamics of large bio-molecular systems. Still, the pos- 
sibility to study simultaneously the dynamics of the systems, the struc- 
tural changes, and the vibrational spectra, allows accurate and reliable 
observations, even if just on medium sized systems for tens picoseconds. 
Moreover, due to the continuous increase in both computing power and 
memory space, these limitations are doomed to be ever less relevant in the 
future. So, the ab initio molecular dynamics framework can be viewed as 
“proving grounds” for development of new approaches that (hopefully) 
could be soon adopted to perform reliable “in silico experiments” even on 
large systems. Besides, it also allows to find and elucidate correlations 
and subtle effects that are important on their own, even in not-so-large 
molecular systems. 


My doctoral dissertation is devoted to study both vibrational spec- 
tra and structural properties adopting molecular dynamics simulations. 
The main link between these two aspects of the molecular behaviour, at 


3Linear response theory could also be employed with classical molecular dynamics 
trajectories, but it would give very little meaningful information due to the use of clas- 
sical force fields. Polarisable force fields can be developed in order to obtain accurate 
results, but they cannot be transferred from one molecular system to another: basically, a 
polarisable force field should be developed for every molecular system under study. 
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least for the systems simulated for the present dissertation, is hydrogen- 
bond (H-bond). The rationale behind the choice of simulating H-bonded 
systems, is that H-bonding is well reproduced by present ab initio ap- 
proaches, and it accounts for most of the induced frequency shifts onto 
localised vibrational modes. This is well exemplified by fig. 2, taken from 
Ref. [CLS04], which shows how a vibrational frequency changes due to 
the variation of a structural parameter (the H-bond distance). This two- 


@ (cm) 


Figure 2: Changing of the OH stretching frequency of HOD molecules, as a func- 
tion of the inter-molecular H. - -O distance. Distances are taken from a (classical) 
molecular dynamics simulation, while vibrational frequencies are subsequently 
obtained with a separate ab initio calculation. Figure is taken from Ref. [CLS04]. 


dimensional contour plot has been extracted from a classical molecular 
dynamics trajectory employing a heuristic model; in this dissertation anal- 
ogous contour plots shall be presented obtained at higher levels of theory 
with much more refined structure-frequency correlation tools. 

The relevance of this work, focused on H-bonding, can be better un- 
derstood remembering the words of Prof. Mark Ratner in a 2004 semi- 
nar [Bec14]: 


chemistry of the XX century was about intra-molecular in- 
teractions; chemistry of the XXI century will be about inter- 
molecular interactions. 


Overview of the Dissertation 


Chapter 1 aims to give a quick and comprehensive view of the adopted 
computational techniques. The price to pay for said simplicity is the com- 
plete absence of details, references, and all those thing that hinder a “laid 
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back” reading of the text. Details about these subjects are given in the 
appendixes and in chapters 2, 3, and 4. 

A brief description of the classical and quantum frameworks employed 
in this dissertation is given in Chapter 2. In Chapter 3 the computational 
procedures to calculate structural and time-dependent properties are illus- 
trated, making use of the tools introduced in the previous Chapter. This 
dissertation is particularly focused onto the correlation between struc- 
tural and vibrational features: so, to find and elucidate new correlations, 
a computational approach based on wavelet transform has been devel- 
oped, allowing accurate time-frequency and structure-frequency analysis, 
as shown in Chapter 4. 

Part II of this dissertation is devoted to illustrate the various case stud- 
ies where these computational means have been used to extract new in- 
formation. All the systems studied share the H-bonding feature. Links 
between theory, computation and experimental results are given in Part II 
specifically for each case study. 


In Chapter 5, a classical molecular dynamics trajectory of a substituted 
dipeptide in a phospholipide bylayer has been analysed, and the H-bond 
dynamics has been directly correlated with the measurements of two di- 
mensional IR spectroscopy. Relevant information about dipeptide pene- 
tration into the bylayer have been extracted from the simulated dataset. 

In Chapter 6, I present the results obtained from many ab initio molecu- 
lar dynamics simulations, all them regarding a glycol solute in a polar sol- 
vent (water or acetonitrile). Different spectroscopic responses arise com- 
paring simulations of different solutes, whereas in acetonitrile an unusual 
structural conformation occurs. Moreover, I have carried out additional 
ab initio (static) calculations to correctly reproduce an unusual frequency 
shift in the vibrational spectrum of acetonitrile. 

In Chapter 7, I report the results of the wavelet analysis of two ab 
initio molecular dynamics trajectories. This approach proved useful into 
retrieving a peculiar splitting of a localised mode of methyl acetate (i.e. 
C=O stretching) due to H-bond; moreover, using wavelets I have been 
able to correctly assign the two components of the doublet to their own 
corresponding structural situation. 

Chapter 8 reports the results of the application of wavelet analysis to 
probe the response of the water environment to a polar heterocyclic solute 
as emerging from an ab initio molecular dynamics simulation. The water 
response to H-bond acceptor has been investigated also in the crystal of 
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Figure 3: OH stretching frequency distribution spread along the fictitious “H- 
bond switching/breaking coordinate”. Figure is taken from Ref. [LRTO6b]. 


lithium nitrate trihydrate, as reported in Chapter 9. The simulated crystal 
is characterised by the presence of an unusual “bifurcated” H-bond, which 
is still subject of much speculation in the literature. In fact, Loparo et al. 
proposed two competitive models to describe the mechanism of H-bond 
breaking and formation [LRTO6b]: in a first model (fig.3, panel A), this 
rearrangement of H-bonds goes through a reaction intermediate, whereas 
in the second one the reaction proceeds through a transition state (fig.3, 
panel B). Loparo et al. in figure 3 schematised these two possible mech- 
anisms spreading the infrared activity along a fictitious “H-bond switch- 
ing/breaking coordinate”. In Chapter 9 I presents results that rule out 
one of these proposed models and, in turn, corroborate the other. 

Finally, a comprehensive general discussion about the findings of this 
dissertation is given in Part III, along with some concluding remarks. 
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List of Some Recurring Abbreviations (sorted alphabetically) 


ACN Acetonitrile 

BSSE Basis Set Superposition Error 

CPMD = Car-Parrinello Molecular Dynamics 

DFT Density Functional Theory 

EG Ethane-1,2-diol (a.k.a. ethylene glycol) 

FT /FFT Fourier Transform / Fast Fourier Transform 
H-bond Hydrogen-bond 

HF Hartree-Fock 

MA Methyl acetate 

MD (generic) Molecular Dynamics 

MG __ Morlet-Gabor (wavelet) 

MLWF Maximally Localised Wannier Function (centre) 
MrG N-myristoylated-methyl-glycine 

PBC Periodic Boundary Conditions 

PDO Propane-1,3-diol 

PLPC 1-palmitoyl-2-linoleyl phosphatidylcholine 
REM Replica Exchange Methods 

SCF Self Consistent Fields 

SDF Spatial distribution function 

VDoS Vibrational Density of States 

WFT Wave-function Theory 

WT Wavelet Transform 


20 


I 
Theory and Models 


1. Computational “Tools” in a Nutshell 


This very short Chapter is a quick presentation of the computational meth- 
ods employed in this dissertation. References, and a much deeper and 
verbose discussion, can be found in the following chapters of this 1° Part 
of the dissertation, as well as in the appendixes. As I said, this Chapter 
just aims to give an idea of what these computational techniques do and 
what could be obtained by them. Here I am going to tell just the absolute 
minimum of information required to approach the research case studies 
presented in the 2" Part. 


Ab Initio (Static) Calculations 


Ab initio (static) calculation basic goal is to solve the (time-independent) 
Schrödinger equation (eqn. 2.2) for molecules and, from the obtained 
wavefunctions, calculate some molecular properties. The procedure is it- 
erative and, depending on the specific level of theory adopted and the size 
of the investigated system, calculations can be more or less accurate and 
took minutes or weeks of computer time. 


(Classical) Molecular Dynamics 


Molecular Dynamics can move molecules according to (more or less) 
realistic forces. Basically, the computer simulates the molecular behaviour: 
translations, rotations, and vibrations are the main motions performed by 
a molecule. 

In classical molecular dynamics, forces are calculated by adopting a 
semi-empirical force field (see Section 2.2.5). This allows simulating a 
huge number of atoms for long timescales; anyway, reliability of the force 
field is not something to be assumed and should not be taken for granted. 

Moreover, being based on crude harmonic approximations, it gives 
little information about molecular vibrations. 
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Since the frontier research in classical molecular dynamics is simulat- 
ing even larger systems, a number of techniques have been developed to 
speed up the conformational space sampling, like for example the “replica 
exchange” method (see Appendix B.2). These techniques, however, have 
some side-effects that hinder some time-dependent analysis of the simu- 
lated trajectories, including analysis used to study molecular vibrations 
and spectra. 


Ab Initio Molecular Dynamics 


Ab initio molecular dynamics moves the molecules like classical molec- 
ular dynamics, but it does not employ a force-field. So, what moves 
molecules? In ab initio molecular dynamics the forces are obtained solving 
“on the fly” the molecular Schrödinger equation for electrons and apply- 
ing them to the nuclei (treated as classical particles). To achieve this goal, 
some mathematical and computational tricks have to be adopted (i.e. Car- 
Parrinello method, see Section 2.3.2), which have the side-effect to render 
impractical the study of systems containing more than few hundreds of 
atoms (up to almost a thousand on very powerful computer workstations). 

Nevertheless, freeing the simulation from the force field crude approx- 
imations, a lot of molecular features can be studied, including vibrations. 


Structural and Electronic Properties 


Typical calculations involving (static) ab initio calculations are the search 
for the equilibrium geometry of molecules, the electronic structure (i.e. 
the charge distribution). These features can be obtained also by ab initio 
molecular dynamics, but at an higher computational cost. 

With (static) ab initio calculations vibrational spectra are usually calcu- 
lated at the harmonic approximation level, and often neglecting the effects 
due to the environment. Since breaking and formation of H-bonds (when 
present) modulate the vibrational frequencies, this is a problematic ap- 
proximation. 

From a molecular dynamics trajectory it is possible to extract informa- 
tion about the structure of a much larger sample of molecules. The pair 
radial distribution function (see Section 3.1.1) gives the so-called struc- 
ture of a liquid and can be used in combination with angular distribution 
functions to obtain a picture of the relative arrangement of molecules. 
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Time-Frequency Analysis 


While vibrational spectra can be obtained with Fourier analysis of au- 
tocorrelation functions, wavelet transforms (see Section 4.2) allow obtain- 
ing time-dependent spectroscopic information. These can be, in turn, cor- 
related with time-dependent structural information to have a comprehen- 
sive picture of how the environment modulates the vibrational frequen- 
cies. In particular, this type of analysis has been adopted for the strongest 
of the (weak) inter-molecular forces, the H-bonding. 


Some Hints for the Interpretation 


A brief discussion about ideas and concepts used to elucidate vibra- 
tional spectra with Fourier and wavelet transforms of molecular dynamics 
trajectories can be found in Section 3.3. Basically, there it is explained how 
to recover spectra from simulated trajectories, and also some considera- 
tions about the predictive power of the harmonic oscillator approximation 
are reported. In fact, harmonic reasoning still proves useful also to have 
an immediate interpretation of anharmonic effects, since these latter can 
often be understood as due to a change in the value of the harmonic force 
constant. 
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2. Computational Chemistry 


In this Chapter the main principles of density functional theory and molec- 
ular dynamics are presented. 

The former is a quantum chemistry theoretical framework (discussed 
in Section 2.1) that allows reliable electronic structure calculations and 
many other molecular properties, with a much lower computational ef- 
forts than other ab initio methods of similar accuracy. 

The latter is a computational approach (presented in Section 2.2) that 
aims to reproduce the time-dependent behaviour of molecules, which is 
pivotal in elucidating their vibrational properties. 

Ab initio molecular dynamics (introduced in Section 2.3) combines both 
the two previous approaches: it adopts density functional theory to solve 
the electronic problem and to calculate quantum forces that, in turn, are 
used to “move” the nuclei (treated as classical particles). 


2.1 Ab Initio Calculations 


2.1.1 Molecular Quantum Mechanics 


The main aim of quantum chemistry is to solve the quantum equations 
of motion for a many-electrons system, to describe accurately the molec- 
ular properties. For a non-relativistic system, the equation to solve is the 
Schrödinger equation for the molecular state vectors (“kets”) |Y') : 


ine |" =e). > (2.1) 


here H is the many-body molecular Hamiltonian operator. The formal so- 
lution for this equation is |Y’(t)) = e"Atiyi(0)). If the Hamiltonian op- 
erator is time-independent (as usually assumed in quantum chemistry), 
then |¥') = e—"Fitly!) and the time-independent ket |) is an eigen- 
vector of the molecular Hamiltonian. This leads to the following time- 
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independent Schrödinger equation for the time-independent ket |"): 
Aly’) = Ely’), (2.2) 


where E; is the eigenvalue of the Hamiltonian and represents the energy 
of many-body quantum state |’). Assuming that no energy degeneration 
is present, there is a one-to-one correspondence between energy levels and 
ket vectors. Molecular ket vectors can be viewed as tensor products of a 
nuclear (|x)) and an electronic ket (|$)): 


9 = Ix) @ lp) =Ix)le) 


Since nuclear masses are much greater than electron mass (i.e. electrons 
kinetic energy >> nuclear kinetic energy), the Born-Oppenheimer approx- 
imation is usually adopted, which leads to a partitioning of the molecular 
Hamiltonian as a sum of two terms: 


total = Fynuctear ae electronic 


and 
pics| x") = ee x") (2 3) 
y z 
Ben gk) = pelectrónic] gk) 
k ’ 


with Etotal — Fnuclear + electronic Within this approximation, the molecular 
state space can be viewed as a direct sum of nuclei-only and electrons-only 
vector spaces 

Viotal = Vnuclei D Velectrons 


The rationale behind this approximation is that electrons and nuclei, being 
so much different in mass, move at very different time-scales and so their 
equations of motion can be solved separately, as if electrons react immedi- 
ately to changes of the relative position of slow moving nuclei. This could 
also be explained saying that, within Born-Oppenheimer approximation, 
electronic states depend only parametrically from the geometry of nuclei. 
So, nuclear coordinates {Ry} are merely considered parameters when the 
electronic problem is solved. This means that, when the nuclear geom- 
etry changes, then the analytical form of the electronic wave-functions 
$'(r;,8) = (rj,s2|') changes too. Within Born-Oppenheimer approxima- 
tion, the electronic Hamiltonian is 


HY = T¢({rj}) + VN°({Rin}) + Un a (2.5) 
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where {Rin = r; — Rn} is the difference between nuclear and electronic 
coordinates, {1;; = r; — rj} is the difference between pair of electronic co- 
ordinates, N and e are indexes to address nuclear or electronic variables, 
respectively, and T and V denote kinetic and potential energies, respec- 
tively. Adopting atomic units, Schrödinger representation and projecting 
operators and kets onto the coordinate basis (i.e. adopting wavefunction 
representation), eqn. 2.5 becomes 


1 electrons electrons electrons 
Á 


ea a 00 
i i N NN i jaa Ti 

The presence of electron-electron interaction (V“ (r;;)) in eqn. 2.6 prevents 
finding an analytical solution, so an approximated approach has to be 
used instead. Approximation methods for eqn. 2.6 can be traced back to 
two main approaches: 


1. perturbative methods, where the electron-electron interaction is con- 
sidered as a perturbation on a single-electron Hamiltonian (AH? — 
ve (r;;)); unfortunately, repulsion energy Vea) does not respect 
this assumption; 


2. variational methods, as application of the variational calculus onto 
some functional of the energy. The variational theorem in quantum 
mechanics provides the consequence that, if 


Aldo) = Eolfo) , 


then using a “test-state” |) that is not (at least, not necessarily) an 
eigenstate of H, then _ 
GA > p, 
(plp) 
with the equality standing true only if |) happens to be a true eigen- 
state of A. Hartree-Fock method (HF) and density functional the- 


ory (DFT) represent the most used variational methods in quantum 
chemistry. 


; (2.7) 


Post Hartre-Fock methods are at an higher “level of theory” compared to 
DFT methods, but they require greater computational resources, so they 
are not implemented in the ab initio molecular dynamics packages adopted 
in this dissertation (i.e. CPMD [CPM]). 


29 


Modelling of spectroscopic and structural properties using molecular dynamics 


2.1.2 Electron Density 


Due to the fact that electrons have an half-integer spin angular momentum 
(i.e. they are fermions), a couple of them cannot occupy the same eigen- 
state of both energy and component of spin (say $+). Thus, the electrons in 
a molecule are described by a set of kets |k) (k here is just a generic index 
comprehensive of both energy and spin state), each of them having occu- 


IA 


pation probability px. It is useful to use a “density operator” 6, defined as 
an external product of a ket and bra vectors, as: 


p= Do pilde) (9 (2.8) 


Since wavefunctions and their complex conjugates can be defined in terms 
of bras and kets as 


p(t) = (rlo) (2.9) 


p) = (en, (2.10) 


then the density operator can be rewritten as 


lölr) = Ar” (x T Ir’) = Leere) = el", 1’) 
K 
(2.11) 
Defining with (x1, x2,...,Xn|g) the many-electron state of a time-independent 
system with n electrons, with x; the whole coordinates of the kt? electron 
(both spatial and spin coordinates), then following eqn. 2.11 the density 
p(r) for the electron #1 can be written as 


e(r) = mf on [ sided p(x xas m)? . (2.12) 


Still, because all electrons are equal, then it has to stand true also that 


Jao) =j 3 (2.13) 


thus, the electron density of 2.13 is a non-negative function of just the 
three spatial variables (r = (x,y, z)). 
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2.1.3 Hohenberg-Kohn Theorem 


A molecular system of n electrons whose Hamiltonian is defined by eqn. 
2.5, is completely determined (at least from a mathematical perspective) 
by the potential V^? (also referred to as “external potential” in [PY89]). 
We can say that the pair n and V^? determine all the properties of the 
eigenfunctions and all the features of the ground state of the system. In 
place of n and V^”, the first Hohenberg-Kohn theorem considers the elec- 
tronic density p(r) as the paramount quantity, as the following states: 


VN® is determined, albeit a additive constant such as the inter- 
nuclear potential VN (it is considered a parameter in this 
case), by p(r). Since p(r) determines n (see eqn. 2.13), then 
p(r) completely determines the electronic ground state and all 
its properties. 


The “ab absurdo” demonstration of this relevant statement is reported in 
Appendix A.1 along with observations on the second Hohenberg-Kohn 
theorem. 


2.1.4 Kohn-Sham Equations 


Equation A.6 (see Appendix A.1) implies the existence of a variational 
principle of the form 
Elo] =0 , (2.14) 


which, in turn, has as a consequence eqn. 2.7. Since the density p is 
constrained by the condition f p(r)dr — n = 0 (eqn. 2.13), the variational 
principle takes the form 


öf Elo] =i [fear-na] } =i) . (2.15) 


The u coefficient can easily be obtained noting that 


öf Elo -y |J eoar- n|} = öE[p] — uô | ear- n] =0 


For many-variables functions, the differential can be written as 


df(q', | eee ae = Dar 
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Exploiting the analogy between functionals like F[f(q', q’,...,q',...)| and 
functions such as f (q', a er oe ..,g"), it is possible to write the follow- 
ing equality 


= oF [f] 11.2 i n. 
or= f...f if dq dg’ dd cdg” 3 


then, inserting ÖE|p] into the previous equation we obtain 


oE(p] _ f f Elo! = 
Fate) dp(r)dr — u f öo@)dr = f E = 2 dp(r)dr = 0 


Moreover, we also have 


öEle] 
öp(r) 


here the definition of the coefficient u matches that of chemical potential, 
and explicit calculations leads to 


_ SElP| _ nern | SEP] _ nern, AT] + VEN} 
re en a 


Since in Schrödinger representation the VN® operator is just a power of the 
electronic coordinates and does not contains mixed terms like rjj = rj — rj, 
we have 


(glVNe|p) = Vp] = f Arple)VNERN} = r) 


Anyway, it must be pointed out that the explicit exact form of functionals 
T’[p] and V“ |p] is unknown and cannot be deduced analytically. In fact, 
T’[p]l and V“[p] functionals contain operators that act on the wavefunc- 
tions changing these latter up to the point that an analytical expression 
for the density p(r) cannot be obtained. Thus, Kohn and Sham did several 
assumptions to make the equations more manageable. 


1. they introduced in eqn. 2.17 a kinetic energy functional that is given 
by the average kinetic energy of non-interacting electrons T[o]; 


2. they adopted a purely Coulombian functionals in place of V® [p] 
obtained by classical electrodynamics (the so called “Hartree poten- 
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tial”) ! that is basically the bi-electronic Coulomb integral of the HF 
method: 


1 
Jij = i, u = 


— Hartree _ — J= p( 11)p p(t )p(t2) gy idr ; 


3. they put all corrections to the previous a assumptions into 
a new functional Exco] , called “Exchange-Correlation Energy”. 


With this approach, it is correct to write 


Tip] + f drot) yvNe 4 len 2a dr +Exclo] . (218) 


Hartree A 


Putting this expression for E[p] into eqn. 2.17 we have 


_ ĉElel _ ETlel | ye + foe „+ $Exclp] 
öp(r) TI Ir < ôp(r) 
It is useful to define two new potentials, Vy. and yet usually called 


“Exchange-Correlation Potential” and “Effective Potential” respectively, 
which depend from r and p(r) and are defined as: 


V= ae (2.19) 
and / 
vet — VN(r) + er dr + Vye . (2.20) 


1 Adopting a plane wave basis, it is better using the Laplace equation 


1 
2 Hartree 
y =. 
V zel) >» 
and then projecting it into the Fourier space 
47 

y Hartreelk) = m Fe] ; 

here with F[p] we refer to the Fourier transform of ö(k) « f drp(r)e”'K" of the density 


e(r). Calculation of this Fourier transform is quite easy in ae wave w as adopted in 
the CPMD [CPM] program. 
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Still, a “recipe” to calculate the electronic density still has to be presented. 
To overcome this issue, Kohn and Sham assumed [KS65] that the density p 
could be calculated from single-electron wavefunctions @/ of the following 
eigenvalue equation (which is obtained in Appendix A.2) 


ites ee 1 
hat = hg! = Rachael ze. (2.21) 
Here the total density is calculated from the many-electrons wavefunc- 


tions g, which, in turn, are constructed as a Slater determinant of the 
single-electron wavefunctions: 


PX, X2, Xn) = = det 1019..0" (2.22) 
P(t) = 1, SLR - (2.23) 
J 


The last equality is due to the fact that the square modulus of the Slater 
determinant of eqn. 2.22 (x1, X2, -.., Xn) is a sum of single-electron orbitals. 
Thus it is obtained 


Le = igi |- jv + ve 6‘) Tip] + f vwe dr . (2.24) 


Remembering equations 2.20 and 2.18, the total electronic energy can fi- 
nally be expressed as 


Berpjeye -3 ff ere? > ard - J Veel rìdr+ Ere . (2.25) 


3 


2.1.5 Basis Sets 


Talking about basis sets in quantum chemistry eventually leads to talk- 
ing about Gaussians. Why Gaussian functions are so widely employed in 
electronic structure calculations? Basically for three very important rea- 
sons: 


1. the overlap of two Gaussians is a Gaussian; 
2. the Fourier transform of a Gaussian is a Gaussian (this property shall 


be invoked again in section 4.2.1); 
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3. with the “fast Fourier transform” algorithm, performing Fourier 
transforms consumes very little computational resources. 


Before computer-era, most quantum chemistry calculations were done 
employing Slater-type orbitals (STOs), which are basically hydrogen-like 
eigenfunctions with an effective nuclear charge. 

For the three reasons presented before, today it is normal practice to 
simulate a Slater-type orbital with a linear combination of Gaussians. For 
example, with so called “STO-nG basis”, the n value represents the num- 
ber of Gaussian primitive functions comprising a single Slater-type orbital. 

In this dissertation, the most widely employed basis for static ab ini- 
tio calculation is the 6-311++G(d,p) basis set (whereas, as discussed in 
Section 2.3.4, the native basis set for CPMD are the plane waves). This 
confusing abbreviation means that 6 Gaussians are used for every core 
electron Slater-type orbital, whereas the three digit number “311” means 
that the valence electrons are described by a combination of three orbitals, 
the first simulated by 3 Gaussians and the other two by a single Gaus- 
sian each. Moreover, polarisation orbitals are added as virtual |d) and |p) 
orbitals, as well as diffuse functions (denoted by the “++” symbol). The 6- 
311++G(d,p) is a rather accurate basis set currently employed in quantum 
chemistry calculations. 


2.1.6 Hartree-Fock VS Kohn-Sham 


The Kohn-Sham approach can be schematised as follows: 
1. single-electron orbitals $' are guessed; 
2. eqn. 2.21 is tentatively solved; 
3. eigenvalues e; and density p = X; |¢'|? are calculated; 


4. with e; and p = Y; |¢'|*, the functional E is calculated adopting eqn. 
2.25; 


5. E is minimised changing some parameters of the original orbitals $'; 


6. return to step #2 and continue until the e; eigenvalues of tentatives 
C and C + 1 are as close as desired. 


This is basically the standard “recipe” of any self consistent field (SCF) 
approach. DFT has both pros and cons compared to the HF procedure. 
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e In HF theory, a single Slater determinant wavefunction is build. 
Then, at the post Hartree-Fock level ?, correction to it (to recover 
the so called “correlation energy”) are introduced considering ex- 
cited states and, overall, introducing additional Slater determinants 
to the many-electrons wavefunction. Hartree-Fock as well as post 
Hartree-Fock methods are nowadays collectively called “wavefunc- 
tion method/theory” (WFT) [Bec14]. 


The rationale behind the Kohn-Sham approach is different: calcula- 
tions are performed onto reference system of electrons that interact 
with each other only through the V,. term in eqn. 2.20 (it is often 
said that they are “non-interacting electrons”, meaning that no term 
in the Hamiltonian of eqn. 2.21 couples their coordinates). This 
reference system is represented only by a single Slater determinant. 


Hence, it can be proved (see ref. [Bec14] for more references) that if 
Vxc is exact, then the density of the reference systems is equal to the 
true many-electrons density. This is the main connection between 
WFT and DFT: the theories have different hamiltonians, but should 
yield the same densities. 


Basically, if Vy: is exact DFT yields true ground state properties with- 
out expanding the many-electrons wavefunction into a space com- 
posed by many Slater determinants: just one is enough. 


e From a theoretical point of view, the DFT actual implementation sel- 
dom are truly “ab initio”. Even if Kohn-Sham equations (eqn. 2.21) 
are exact, the adopted expressions for V,. in chemical relevant cal- 
culations usually contain terms and parameters that are empirically 
derived and not theoretically justified. On the contrary, within HF 
theory, only mathematical approximations are allowed; 


e DFT equations are much more easily solved than post Hartree-Fock 
ones: if n is the number of electrons in the system, then DFT calcula- 
tions can be proportional up to n°, whereas a corresponding post-HF 
calculation can be proportional up to n°. So, doubling the number 


Post Hartree-Fock calculations basically start with a simple HF calculations and then 
introduce corrections with some perturbative method (e.g. “Many Body Meller-Plesset 
Perturbation Theory”) or with Configuration Interaction (e.g. CI or Coupled Cluster meth- 
ods.). 
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of electrons a DFT calculation becomes 8 times slower, but a post-HF 
one becomes 64 times slower! 


e DFT does not suffer from lack of correlation energy, which, con- 
versely, is absent into HF method. To recover correlation energy 
within a HF-based scheme one has to introduce further corrections 
(“post-Hartree-Fock”) that make the calculations much slower; 


e a final comment: to derive the HF equations, one starts from an 
exact many-electrons Born-Oppenheimer Hamiltonian and arrives 
to an approximate single-electron Fock Hamiltonian. Then one can 
correct this result with some post-Hartree-Fock approach to recover 
electron correlation (assuming one has the computational power to 
do so) and come close at pleasure to the desired result. 


The DFT approach is quite the opposite! Kohn-Sham equations are 
exact, but the exchange-correlation functionals adopted in practice 
are not. 


In HF-based methods one knows (at last, theoretically) how to im- 
prove results. To do the same within DFT, one should develop a 
new (and better working) exchange-correlation functional. It does 
not surprise that the development of new functionals is still an ac- 
tive research field in quantum chemistry. 


Indeed, a wise choice of 
OE xe 


Vee ôp (r) 
determines the accuracy of the DFT calculation. For a system of free and 
(almost) non-interacting electrons, Vye depends only on the electron den- 
sity pọ. Anyway, for real electrons, Vye depends not only on p but also on 
further derivatives of it, so that 


Vie = Vrelo(t), VER), V (Vo(r)), +] 


A discussion on functional forms and properties is reported in Appendix 
A.3. 


2.2 Classical Molecular Dynamics 


Molecular Dynamics (MD) calculations allow simulating the behaviour 
of molecules at atomic scale, solving numerically the proper equations of 
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motion. When these equations are Newton’s, the method is called classical 
Molecular Dynamics. In this Section the principles of classical and general 
MD are presented. 

Since both classical and ab initio MD share a lot of features, many 
aspects of the former still apply to the latter and will not be discussed 
again in Section 2.3. 


2.2.1 Equations of Motion 


Classical MD has the aim to solve the Newton equation 
F = Mkak (2.26) 


for all N particles of the system (which are represented by the k index). In 
a classical MD simulation, the atoms (nuclei and electrons) are the parti- 
cles that move and on which the calculated forces are applied. The forces 
derive from the so called “force fields” (see Section 2.2.5), which take into 
accounts both electrostatic and dispersive contributions. The electrostatic 
forces are due to the fact that, even in (neutral) molecules, local charge can 
exceed zero and so partial charges are attributed to the atoms (or pseudo- 
atoms °). Dispersive forces are due to higher multipole interactions and 
weak polarisation effects. H-bond is not accurately described with a stan- 
dard classical force field, due to the inability of the latter to correctly re- 
produce the covalent character of this type of bond. Still, the bulk of the 
electrostatic H-bonding interaction is retained, enough to roughly recover 
at least the structure of H-bonded systems. Further details on force fields 
are given in Section 2.2.5. Calculating energies, forces and displacements 
represent the main computational burden of a MD simulation. Anyway, 
before discussing this issue, it is useful to put the Newton equation in a 
more convenient form. In fact, eqn. 2.26 could be rewritten in Cartesian 
coordinates as: 


Bee. E. ae (2.27) 


3Pseudo-atoms are basically excesses of charge permanently put on positions around 
the molecules where there are not true atoms. For example, in the so called TIP4 force 
field for liquid water [JCM*83], an additional negative charged dummy atom is placed 
near the centre of mass of the molecule (close to the oxygen atom). This procedure is 
used to parametrise complex electrostatic responses when simply putting charges onto 
real atoms cannot reproduce known effects. 
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This form is not invariant with respect to a change of coordinates, which 
will be relevant in presenting the Car-Parrinello method, so it is better to 
adopt Lagrangian and Hamiltonian formulations in order to express the 
equations of motion in generalised coordinates. The Lagrangian £ is a 
function that depends only on the generalised coordinates {q’} and on 
the generalised velocities {q'} of the system (with i = 1,...,3N). The 
Lagrangian satisfies the following equations 


oL dor 

T — ET =0, (2.28) 
which are obviously called “Lagrange equations”. They represent a sys- 
tem of 3N partial differential equations, one equation for every i coor- 
dinate. When only conservative forces are present (i.e. systems where 
F = —VV, with V potential energy of the system depending only on the 
coordinates), the classical Lagrangian takes the form 


£L=T(q)—-V(q) , 
with T kinetic energy of the system. The momentum is defined as 
OL 
a= aq’ 
and, inserting eqn. 2.29 into eqn. 2.28, its time-derivative assumes the 
form 


(2.29) 


» dəl ƏL 

P= rag a 
It is particularly helpful to express the equations of motion as depen- 
dent on a function that coincides with the total energy of the system. 
Moreover, equations 2.28 constitutes a system of second-order partial dif- 
ferential equations, whereas in many fields it would be simpler to ap- 
proach first-order partial differential equations. The required function for 
these equations is the Hamiltonian H, which is defined as the Legendre- 
transform of the Lagrangian‘ : 


(2.30) 


oN OL si isi 
Ma Lg) : (2.31) 


*Formally speaking, the H function of eqn. 2.31, which is a function of coordinates 
and velocities, is not the same function of equations 2.32 and 2.33, which depends on coor- 
dinates and momenta. The two functions are numerically equal (at least for conservative 
forces), so it is not really misleading to adopt the same symbol for both of them. 
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With some manipulation, and exploiting equations 2.29 and 2.30, it is pos- 
sible to demonstrate the following Hamilton equations 


OH. : 
og —Pi (2.32) 
and 3%, 
dp; =+q , (2.33) 
with 


H(p,q) = T(p) + V(q) = E" 


Equations 2.32 and 2.33 form a system of 6N first-order partial differential 
equations, as required, and the Hamiltonian H function is numerically 
equal to the total energy. 


2.2.2 Integration Scheme 


Hamilton equations 2.32 and 2.33 shall be helpful in the Appendix B.1, 
where a general approach to derive numerical integrators is presented. In 
this Section, I present a simplified approach (for example, see [Tuc10a]) 
to obtain just one class of numerical integrators, the so called “velocity 
verlet”. 

The position q of a particle at time t + At can be expressed as a Taylor 
series of the position at time t. Dropping all terms of the Taylor expansion 
higher than second order in At, we obtain 


g(t + At) © a(t) AC i At + Fal) (AN? 


Knowing that q(t) = v(t) and q(t) = F(t)/m, the previous equation can 
be rewritten as 


1 
q(t + At) ~ q(t) + v(t) At + FÜ): (At)? . (2.34) 
Nothing prevents us from writing an analogous equation to recover q(t) 


from data at time t + At, with a time-leap equal to —At, obtaining 


q(t) © q(t+At) +vE+A8) - (-A8) 4 5 F(t+ At) (At)? . (2.35) 


Now we can substitute eqn. 2.34 into eqn. 2.35, obtaining 


0 = —v(t + At) - At + F(t + At) - (At)* + v(t) At + =F (t) - (At)? 
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that can be rearranged as 


v(t At) = v(t) + EG) HEE AN] At (2.36) 
Since the state of a classical system is a point in the phase space, then the 
pair of eqn. 2.34 and 2.36 can propagate in time the state of the system 
to produce a dynamics. Here, eqn. 2.34 and 2.36 are expressed in terms 
of coordinates and velocities, but they could be reformulated in terms of 
coordinates and momenta to match eqn. B.8 as obtained in Appendix 
B.1, where a deeper discussion about the advantages of this particular 
integration approach can be found. 


2.2.3 Periodic Boundary Conditions 


In a MD simulation, atoms, molecules and ions are put into a box > and 
their equations of motion are solved as explained in the previous Sections. 
Since in a simulation the number of particles N cannot be infinite, in 
order to reproduce the properties of a bulk and extensive environment, 
periodic boundary conditions (PBC) have to be employed; otherwise, only 
simulations in vacuum could be performed. With PBC, other cells are 
constructed around the simulation box, equal to the latter, and equations 
of motion are solved only for the “true” simulation box, whereas particles 
in the other cells are just images of the N particles inside the original 
box. For a cubic cell, this means that when a molecule exits from the true 


Figure 2.1: The green particle exits through the right side of the red cell, but its 
“image” enters from the left cell. This is repeated over all cells. 


5The box does not necessarily has to be cubic. 
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box thought the right side, then its image enters again into the same box 
through the left side, as showed in fig. 2.1. Adopting PBC has, however, 
some minor and yet undesired consequences: 


e angular momentum is no more conserved. In fact, since the angular 
momentum L is the vector product 


N 
L= Lro NP > (2.37) 
1 


with i being particle-index, it is self evident that when a particle exits 
and its image re-enters the simulation box, its velocity has changed 
direction and this, in turn, changes the sign of the vector product; 


e spurious correlation between particles and their images are intro- 
duced into the simulation. Nevertheless, if the simulation cell is 
large enough hopefully this effect would have little significance. 


2.2.4 Ergodic Hypothesis 


Since (q,p) is the classical 6N component vector of the N particle state, 
then the mean value of a quantity 0(q, p) is 


(2.38) 


where f(q,p) is a function representing the density of states in the phase 
space. f(q,p) assumes different forms depending on the specific statis- 
tical ensemble. ° Integrating on q and p in classical mechanics means 
taking the average on the states of the systems. Conversely, in quantum 
mechanics the classical integral has to be replaced by a sum of quantum 
states. 

The so-called “ergodic hypothesis” allows to correlate simulation re- 
sults with measurable thermodynamic properties (i.e. time independent 


6For example, in the micro-canonical ensemble NVE, where the degree of freedoms 
(N), the volume (V) and the total energy of the system (E) are all constants, the p function 
is a Dirac’s delta (Etot — H(q' ,Pi)), which means that states with energy different from 
the constant value E do not contribute. 
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ones). Returning to the general quantity 0, the ergodic hypothesis let us 
write its mean value as 


1 tmax 
= li t)dt 2. 
(= tim | — [™ aa (2.39) 
without performing the difficult task of integrating on classical phase- 
space states, as eqn. 2.38 would prescribe. Since in (equilibrium) MD 
one usually assumes that the ergodic hypothesis stands true, is helpful to 
pinpoint some relevant peculiarities of this hypothesis: 


e the hypothesis imply that in MD the mean on the states is numeri- 
cally equal to that on the time-steps, if the simulation is long enough; 


e for long enough simulations the system completely explores the 
available phase-space; 7 


e there is no guarantee that a simulation run is long enough to have 
reached the ergodic limit. 


Anyway, previous experiences and simulations broadly confirm that for 
fluid systems up to a thousand degrees of freedom (the typical value for 
an ab initio MD simulation) simulation run of tens of picoseconds are usu- 
ally enough, and for systems up to a 100 x 10° degrees of freedom (indica- 
tive value for classical MD simulations) usually runs of tens or hundreds 
of nanoseconds are required. To speed up the MD achievement of the er- 
godic limit, many computational strategies were developed to improve the 
phase space sampling, like the replica exchange method (REM) discussed 
in Appendix B.2. 


2.2.5 Force Fields 


The reliability of a classical MD simulation depends crucially on the choice 
of the force field. Basically, the force field contains all the potentials of 
the many intra- and inter-molecular interactions, which are usually called 
“bonded” and “non-bonded” potentials. 

Bonded potentials include stretching and bending interactions, both 
modelled with a quadratic potential of the appropriate coordinate. The 


7For a NVE simulation, this imply that the available phase-space is such that the 
potential energy V(q) is less (or equal) than the total energy E(q, p). 
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stretching interaction changes as ~ ks ra with r being the distance between 
two covalent bonded atoms i and j. The bending interaction changes as 
~ yO igjy with a being the angle between two atoms (i and j), both co- 
valently bound to a third (k). Bonded potentials also includes torsional 
effects that are often modelled as Viorsion ~ Kg[1 + cos noir], where ix; 
represents the torsional angle between the 7 and j atoms and n is an in- 
teger that usually assumes the value 3. The constants ks, kp, and Ky have 
force-field specific values and are usually deduced empirically from spec- 
troscopic and thermodynamic measurements. The total bonded potential 
is computed as a sum of two-atoms terms. 

Non-bonded potentials determine the inter-molecular interactions and 
usually include electrostatic and Lennard-Jones terms: 


Lennard-Jones 


12 6 
1 dig; ij Ci 
Vane bonded = > 3, k— de, — = — P (2.40) 
i jzi Go fj 
electrostatic 


The electrostatic interaction shall be discussed in more detail in the fol- 
lowing Section 2.2.5. The Lennard-Jones potential is adopted to model all 
inter-molecular interactions weaker that the electrostatic one, i.e. mainly 
van der Waals interactions. The g; parameter is approximately the van der 
Waals radius of the atom i, and usually o;; is calculated as the arithmetic 


mean 
0; +o; 


pl y 
whereas e;; is often computed as a geometric mean 


Ej = fEiej , 


where -e; is the depth of the energy well at the minimum of the Lennard- 
Jones potential (which, in turn, is located at fmin = 21/60;) for the atom i. 
The two previous equations are often referred to as “Lorentz-Berthelot 


Oi; = 


1 
mixing rules”. Double sum 5 Li Ljzi is adopted to avoid 1-2 and 1-3 pairs 


into equation 2.40 ® (i.e. to exclude stretching and bending related pairs 


1 
8The double sum 5 Li } jz; is equivalent to Lj Yj>i- 


44 


Francesco Muniz Miranda 


from the non-bonded potentials); anyway, 1-4 pairs (related by torsional 
potentials) are somewhat considered at the border-line between bonded 
and non-bonded pairs. Some force fields, such as AMBER [HAO*F06], 
take into account 1-4 pairs into eqn. 2.40, albeit with a dumping factor that 
reduces their relevance, whereas other force fields like CHARMM [FJ00] 
exclude these pairs in the sum of eqn. 2.40. 


2.2.6 Long Range Interactions 


As shown in Section 2.2.3, PBC allow performing bulk simulation avoid- 
ing border effects, but on the other hand the use of PBC implies a cut- 
off of all interactions. This is not a relevant problem for interactions 
modelled with a Lennard-Jones potential, since they decay quickly with 
the inter-atomic distance r;j. This is equivalent to say that the series of 
Lennard-Jones potentials converges within the cutoff radius, but the same 
is not true for electrostatic interactions, so a cutoff of these could generate 
energy-conservation issues as well as severely unrealistic forces. To over- 
come this setback, different approaches were developed, including the 
Ewald Summation method. In the Ewald scheme, the electrostatic series 
is partitioned into two new different series (on more than one simulation 
cell) that are absolutely convergent (see for example Ref. [F502], Chap- 
ter 12). Details of this approach are really tedious and can be found in 
the Appendix of Ref. [Tuc10a], pp.652-662. However, it should be pointed 
out that Ewald-based approaches introduce spurious correlations between 
particles, since the sum is extended to particles of different cells. 


2.2.7 Effectiveness and Shortcomings of the Method 


Classical MD is a powerful tool to reproduce the behaviour of many 
molecular systems. In fact, by means of MD calculations it is not only 
possible to recover the same structural data obtained with other simu- 
lation techniques (i.e. Monte-Carlo), but also to obtain time-dependent 
properties, as will be discussed in Chapter 3. This dissertation is mainly 
devoted to calculate vibrational properties, so the Monte-Carlo approach 
could not be employed. 

It has to be pinpointed that the trajectories generated by classical MD 
are not necessarily close to the “real” molecular trajectories: force fields 
contains semi-empirical parameters and are mostly based on very crude 
approximations (e.g. harmonic approximation for bending and stretch- 
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ing potentials) that produce significant divergences. So, how it comes 
that classical MD is widely employed? Basically, what is most relevant in 
molecular simulations is to acquire not the “real” behaviour of molecules, 
but a “realistic” one that is sufficiently reliable to elucidate empirical 
datasets. 

Serious shortcomings of classical MD simulations (particularly in re- 
lation to the purposes of the present dissertation) are listed in the follow- 
ings. 


e They cannot reproduce chemical reactions, which, on the contrary, 
can be obtained in ab initio MD trajectories; this is due to the fact that 
the force field adopted does not allow the alteration of molecular 
connectivity. 


e Moreover, since the force fields determines the interactions once and 
for all, polarisation effects are largely lost, so anharmonic effects in 
vibrational spectra cannot be reproduced. 


e As a consequence of the previous observations, even the effects of 
H-bonding (which retain a covalent character) cannot be correctly 
described. 


It is true that with classical MD many observations regarding the sta- 
tistical behaviour of the H-bond can still be made, but the effects of this 
bonding onto the vibrational spectra are mostly lost. It is possible to 
develop “polarisable force fields”, where the parameters are allowed to 
change according to the dynamic behaviour of molecules, but these force 
fields are optimised for the specific system to simulate and cannot be eas- 
ily transferred from a system to another; besides, they are usually hard 
and time-consuming to be developed. 

For these reasons, this dissertation mainly employs ab initio MD, which 
(in theory) provides the best of both static ab initio calculations and MD 
simulations. In fact, ab initio MD takes into accounts realistic electronic 
structure and is therefore able to retain polarisation effects, overcoming 
the three issues just made. 

The price to pay to use ab initio MD is its larger computational cost. 
This is mainly due to the increase in the degrees of freedom of the system 
(as shall be shown in Section 2.3), which forces to simulate only a limited 
number of atoms and for relatively short time (up to a thousand of atoms 
and few hundreds of picoseconds at best, respectively). 
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So, ab initio MD is at a higher “level of theory” than classical MD, but 
the complexity of the systems that can be realistically simulated decreases. 
Indeed, classical MD is still the suitable computational tool to investigate 
large molecular aggregates, such as proteins and other large biomolecules, 
whereas ab initio MD is mainly employed to study in detail systems made 
up of hundreds of atoms. 


2.3 Ab Initio Molecular Dynamics 


This Section wants to show how DFT and classical MD combine to create 
a dynamics of nuclei that move according to realistic quantum forces.? 
There are various ab initio MD approaches: in the following, I shall briefly 
present the Born-Oppenheimer MD, which provides much of the theoret- 
ical background of the whole subject, and then the Car-Parrinello MD, 
which is the method more extensively employed in this dissertation. 


2.3.1 Born-Oppenheimer Molecular Dynamics 


The core concept of ab initio MD is to treat nuclei like classical particles 
(just like in classical MD) but subject to forces obtained by ab initio cal- 
culations on electrons. Born-Oppenheimer MD is a conceptually simple 
approach to ab initio MD: at every time-step the ab initio equations for 
the electrons are solved (e.g. the Kohn-Sham equations of DFT) obtaining 
the eigenfunctions $; that minimise the electronic energy and calculating 
the Slater determinant 9; then, semi-classical forces are obtained via the 
Hellman-Feynmann theorem adopting equation C.2, !" here reported for 
completeness, 


F; = py = -Vr (p |A| p) (2.41) 


where Vr, represents the gradient with respect to the coordinates of nu- 
cleus k. Minimising the total electronic energy (at every MD time-step) 
means that the electronic system has to be onto the Born-Oppenheimer 
energy surface, which, in turn, depends parametrically onto the nuclear 
coordinates {Ry } and where the electronic ground states for each specific 


?Obviously, with the term “quantum forces” I refer to semi-classical forces, analogue 
to those obtained via Hellman-Feynmann theorem. For an accurate discussion of this 
theorem, and its relation with ab initio MD, see Ref. [MH09], pp. 52-56. 

10A short presentation of this theorem is reported in Appendix C.1. 


47 


Modelling of spectroscopic and structural properties using molecular dynamics 


nuclear configuration lie. An approach similar to this is adopted, for ex- 
ample, in the CP2K [VKM*05,CP2] ab initio MD code, which adopts a mix 
of plane waves and Gaussian functions to expand the electronic wavefunc- 
tions. Programs like CP2K have not been employed in this dissertation to 
perform ab initio MD, because they often show a sub-optimal energy con- 
servation, with Gaussians associated to some issues (e.g. BSSE problems), 
as discussed in Appendix C.2. 

In the eighties, M. Parrinello and R. Car developed an alternative ap- 
proach that is computationally more efficient since, as explained soon, it 
requires just one wavefunction optimisation. 


2.3.2 Car-Parrinello Molecular Dynamics 


Parrinello and Car, in order to avoid to repeat the wavefunctions optimisa- 
tion at every MD time-step, postulated in [CP85] the following Lagrangian 
L° for a system of quantum electrons and classical nuclei: 


potential energy 
occ ——, 


1 . 1 . BR 
LE = n ),Mx(Rn) + z Y ulo | d') - Elp; Rn] + constraints 
N i 


N 


kinetic energy 
(2.42) 
In equation 2.42: 


Rn are the nuclear coordinates; 

Mn are the nuclear masses; 

Q! is the Kohn-Sham wavefunction for the i electron; 

p is the total electronic density built from single-electron wavefunctions; 
u is a parameter that is called “electron fictitious mass”. 


The “constraints” in eqn. 2.42 mean to assure normalisation and orthogo- 
nality of the wavefunctions and take the form 


LIA Gilg’) -— 34), 


ij 
where 


Al is a Lagrange multiplier 
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(Qili) — dj; is the ortho-normality constraint for single electron wave- 
functions; 


E is the total electronic energy E[p; Ry] obtained through eqn. 2.25 (which 
depends parametrically on nuclear coordinates) and the total nu- 
clear repulsion energy. 


In eqn.2.42 the electronic wavefunctions take the role of system coor- 
dinates since their first derivatives ¢' are part of the kinetic energy term 
of the Lagrangian. !! Indeed, the term 


um bild‘) =. m fa a — So far’ 


formally resembles the classic kinetic energy T = 3%; m;x? and hence is 
called “fictitious kinetic energy” of electrons. Adopting the Car-Parrinello 
Lagrangian, the corresponding equation of motion (eqn. 2.28) for the elec- 
tronic wavefunctions becomes 


do 
pen ES pe; (2.43) 
na ~ ap 
whereas for the nuclei 
d ð „.p__9 „cp 
u Rye = > (2.44) 


Performing the calculations explicitly for the wavefunctions equation, we 
obtain 


d oi _ f vi = _ OE Ele; Rn] Rn] i ir 
ee T LA P(r t) , (2.45) 
and, employing egn. 2.25 and 2.24, 
dElo; RN x 
ER] L Ag), 
pk 
thus, eqn. 2.45 reduces to the simple form 
wid! (r,t) = -AS gi (r, t) + LA“ (2.46) 


Sum 1° is limited to occupied Kohn-Sham orbitals only. 
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For the nuclear coordinates we obtain 


dE[p; Rn] 


MyRw = — ORN 


(2.47) 
Equations 2.46 and 2.47 describe the time evolution of nuclear positions 


and electronic wavefunctions in Car-Parrinello ab initio molecular dynam- 
ics (CPMD) [CPM, CP85]. 


2.3.3 Explanations 


As can be seen, CPMD approach is very different from Born-Oppenheimer 
MD and it is not immediately apparent why it should produce comparable 
results. This Section provides explanations of the method and shows some 
basic features of the CPMD procedure. 

In Born-Oppenheimer MD the electronic quantum problem is solved 
whenever the nuclei are moved (i.e. at every MD time-step): the wavefunc- 
tions ¢' evolve in time only thanks to the SCF calculations that quench 
them again onto the Born-Oppenheimer energy surface. On the con- 
trary, within the CPMD approach the electronic wavefunctions are time- 
dependent variables !?: they are calculated (and quenched onto the Born- 
Oppenheimer surface, i.e. optimised) just once before the dynamics starts, 
and then they evolve in time like classical coordinates according to eqn. 


12]t is necessary to specificate that in the CPMD scheme the time-independent 
Schrödinger equation 2.2 is not solved at every time-step, nor the time dependent 
Schrödinger equation 2.1, but rather the algorithm simply propagates the wavefunction 
in time according to equation 2.46. To really solve eqn. 2.1, an approach like “time de- 
pendent density functional theory” (TD-DFT) should be employed, which is based on the 
following equation 


oe ie (¥ (8) jing = a [¥(t))dt = 0 


dp(r,t) . 
so that 
sai eff pi 12 t elr,t) y , JExc(t,t)] ; 
er = = | yex t)] 4 dr 4 ir 
ns P(e) = = -5V + vole A] f Ed + | iy 
where V®t is a new time-dependent functional. TD-DFT still poses some theoreti- 
cal (e.g. time evolution depends on both the density current j = -ih[Y*(t)VY(t) — 


(VY*(t))Y(t)]/2m and the scalar density p) and practical challenges (can be very time- 
consuming, and the analytical expression of reliable time-dependent functionals is difficult 
to obtain). A brief discussion of this approach is reported in Ref. [Mar04], pp. 147-149. 
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2.46. In fact, eqn. 2.46 is integrated with a velocity verlet algorithm! 
(see equations 2.36 and 2.34). Thus, the price to avoid optimisation at 
each time-step is to increase the number of degrees of freedom, as briefly 
anticipated in Section 2.2.7. 

In order to assess the reliability of this approach, CPMD generated 
trajectories and other ab initio MD trajectories should be close to each 
other: in other words, the single electron wavefunctions ¢' have to evolve 
in time, remaining near the Born-Oppenheimer energy surface. Within the 
CPMD approach, this latter requirement is not always assured, but under 
some suitable conditions this requirement can be made true in practice. 

The Lagrangian reported in eqn. 2.42 shows an “unusual” kinetic en- 


ergy term, 
occ 


1 EEE 
see lg) 


containing the fictitious mass of the electrons (u parameter) and the “ve- 
locities” ¢'. The u parameter has physical dimension of mass x length 
and its actual value is (in theory) arbitrary, but it has profound effects on 
the dynamics. With a “large” value for u, the dynamics of the wavefunc- 
tions will be particularly slow, which means that the wavefunctions will 
adapt slowly to the changes occurring in the nuclear configurations; con- 
versely, if u has a “small” value, wavefunctions will respond quickly to 
nuclear conformational changes, allowing the electronic system to remain 
near the Born-Oppenheimer surface. '* In other words, a large u leads to 
a stiff wavefunction dynamics, whereas a small u leads to a more flexible 
wavefunction dynamics that will adapt to the nuclear dynamics without 
depart too much from the electronic ground state. 

If u = Mn, then the wavefunctions dynamics will be very dependent 
on the nuclear dynamics (i.e. electrons wavefunctions and nuclear po- 
sitions are strongly coupled) and a flow of energy between nuclear and 
electronic degrees of freedom occurs. This has to be avoided since it un- 
dermines the dynamics of molecules. Instead, if u; < My, then the dy- 
namics for ¢' and Ry will be almost completely decoupled, which is a 
good thing since it allows to remain close to the electronic ground state. 


13In the CPMD routines the velocity-verlet is coupled to an iterative scheme called 
“SHAKE?” [RCB77] that ensures that wavefunctions remain | to each other. 
141 arge” and “small” have to be understood in relation to the mass of the nuclei 


{Mn}: 
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However, there is a limit to how small u can be chosen: if the dynamic of 
the wavefunction becomes too fast, the time-step to integrate the equations 
of motion should be made small in turn, which is a disadvantage since (as 
will be discussed in Section 2.3.4) the time-step in CPMD is already very 
small. Thus, the u parameter assumes the role of an “adiabaticity param- 
eter” (is referred so in Ref. [MH09], p. 37). 

A computational prove of the adiabaticity of the CPMD trajectories 
will now be presented, and can be found with greater details in References 
[PSB91, BS98, MH09]. The electronic dynamics generated by eqn. 2.46 can 
be analysed in terms of correlations functions; in particular, for small (but 
finite) deviations from the electronic ground state (i.e. well made but real 
simulations), it has been proved that the Kohn-Sham orbital dynamics is 
given by a superposition of oscillators of frequency 

Wij = aama 7 (2.48) 
H 
where ej and e; are eigenvalues of unoccupied (virtual) and occupied 
Kohn-Sham orbitals, respectively. This means that the energy of the wave- 
functions oscillates about the expected ground state value, without de- 
parting significantly from it. Moreover, if with 


occ 


Cop (t) = LEO); E) (2.49) 


i 
we refer to the autocorrelation function of the occupied Kohn-Sham or- 


bitals, then it is observed that the power spectrum obtained using eqn. 
2.48 matches the Fourier transform of eqn. 2.49: 


rw) = [> caled f? YOO 
= cos(wt) KB); 4 ())adt (2.50) 


Thus, the smallest frequency in the electronic system dynamics is 


E 
Wmin X SP , 
u 


with Eg.p being the difference between the lowest unoccupied (“LUMO”) 
and the highest occupied (“HOMO”) orbital. For a reference system with 
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Egap = 2eV (= 7.35 - 107°? a.u.) and with u = 300 a.u. (a typical value for a 
CPMD simulation), we have that Wmin = 30600 cm! whereas the higher 
vibrational frequency of the nuclear motion will reasonably be < 4000 
cm. This great difference between the characteristic frequency of (elec- 
tronic) wavefunctions and nuclear motion ensures that, with a suitable 
choice of the parameter u, the electronic dynamics shall follow adiabat- 
ically the nuclear motion, with just a negligible energy transfer between 
slow (nuclear) and fast (electronic) degrees of freedom. 


2.3.4 Implementation 


Within the CPMD program !? and in most DFT calculations on condensed 
phases, it is common practice to model core electrons with pseudopoten- 
tials (which are discussed in Appendix C.3) and to expand in plane waves 
the wavefunctions of valence electrons only. 

The use of plane waves is particularly helpful and it is the natural 
choice of basis set to study periodic systems, but is also employed in 
CPMD calculations of non-periodic systems because the finite dimensions 
of the simulation cell as well as PBC by definition introduce a periodicity, 
albeit a spurious one. 

In a system with periodicity X, e.g. a molecular crystal, the electronic 
density is also periodic and we have 


pr) =p(r+X) . (2.51) 


In a non-periodic system simulated in a cubic cell adopting PBC, X would 
correspond to the cell length. Such density can be decomposed into its 
Fourier components (k) in the reciprocal k space, as 


p(t) = | apike" 


The latter integral can be discretised and reduced to a simple sum by 
adopting the approximated expression 


p(r) = 2 


15“CPMD” is the acronym of both the computational package (ref. [CPM]) and the ab- 
stract method (ref. [CP85]), but we risk no dangerous misunderstanding since it is always 
evident from the context what is what. 
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The Bloch theorem (see Ref. [JM11], pag.13, eqn. (1.4.9)) tell us that eigen- 
functions have the same periodicity (X) of the lattice: 


plr +X) =e, (1). (2.52) 

Thus, we obtain . 
p'le) = dur) = e**uj(k, x) (2.53) 
ui(k,r) =} acco , 2.54 

G 
which combined lead to 

p (r) = pile) = ec. (G)eC* (2.55) 

G 


To calculate the electron density and the total energy, a sum on all k-points 
should be carried out: 


2 
| (2.56) 


p(t) = Zu D ekl) 
1 

where w* is the weight of the k-point. Usually, the sum in eqn. 2.56 
is limited to a small number of k-points, chosen to reflect the simmetry 
properties of p. In crystals with significant lattice distortion (leading to a 
symmetry breaking) and non-periodic systems like gases and liquids, the 
sum can be limited to just the point T = (0,0,0) without loss of generality, 
so that eqn. 2.55 reduces to 


pP) =} ce. (2.57) 
k 


A cut-off Ecut for the k-sum is usually adopted, excluding from eqn. 2.57 
plane waves with kinetic energy greater than Ecut: 16 


1 
zlk + G|? < Ecut 


The optimal value for Ecu is affected by various factors, first of all the 
type of chemical species to simulate. Finally, it should be noted that the 
use of plane wave in place of more widely adopted Gaussians (see Ap- 
pendix 2.1.5) has the further advantage of avoiding undesired effects such 
as “basis set superposition errors” (BSSE) and Pulay forces, as discussed 
in Appendix C.2. 


16Since plane waves are simultaneous eigenfunctions of both the momentum and the 
2 
free-particle Hamiltonian, calculating the kinetic energy is particularly simple, Ep = a = 
1 K 
7 . 
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2.3.5 Hydrogen vs Deuterium 


In this dissertation, as well as in literature (e.g. [MSC08a,MSC08b,KMM*04]), 
many simulations are performed on deuterated or partially deuterated 
samples. Why this unusual choice, considering that deuterium (D, or °H) 

is much less common in nature that 'H? There are at least three good 
reasons behind it. 

First, IH is the lightest element relevant in chemistry, and deuterium 
mass is about two times greater. The fact that the mass of !H is so small 
represents a problem for Born-Oppenheimer MD, since quantum effects of 
the nuclear motion becomes really important in this case. Because in Born- 
Oppenheimer MD quantum effects of the nuclear motion are completely 
neglected, it is common practice to adopt an heavier isotope to mitigate 
this issue. 

Furthermore, swapping !H with ?H strongly affects the reduced mass 
of the system (u), thus affecting the vibrational frequency v, which, within 
harmonic approximation, is expressed as 

1 


va— , 
vr 

(with u being the reduced mass of the vibrating group). This is useful 
to “change” the vibrational frequencies of overlapping bands, in order to 
sort out every specific contribution to the spectrum independently. For 
example, the characteristic vibrational frequency of the O—H stretching is 
found at about ~ 3500 cm~!, while that of the O—D stretching is much 
lower at ~ 2500 cm. 

Finally, the dynamics of !H is much faster than that of ?H. This fact 
forces to perform MD simulations adopting a smaller time-step to cor- 
rectly describe and integrate the equations of motion, decreasing the size 
of the sampled phase space. In fact, when !H is used, for the ab initio 
MD simulations performed in this dissertations we chose a time step of 
4 a.u. ( 0.096 femtoseconds), while a longer time-step of 5 a.u. (~ 0.12 
femtoseconds) is adopted for fully deuterated simulations. 
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Simulating the molecular behaviour would mean nothing if, from this, one 
cannot extract some useful information. This latter can be correlated with 
experimental measurements and this allows interpreting or predicting the 
observed features. The information that can be extracted from ab initio 
MD can be roughly summarised in three categories, as following. 


Structural data, such as pair distribution functions, give insights about 
the average behaviour of the molecular systems, and are also useful 
to check the reliability of the simulation. 


The electronic structure calculations can elucidate empirical data, be- 
cause they are useful to create suitable models of interpretation. 


Time-dependent and spectroscopic data often combine the peculiarities 
of the previous two subjects, providing a direct comparison with 
experimental spectra. 


In this Chapter a brief presentation of a selected number of properties 
shall be given, to elucidate the theoretical background for the research 
work presented in Part II. 


3.1 Structure 


3.1.1 Pair Radial Distribution Functions 


Pair radial distribution function g(r) is an important thermodynamic quan- 
tity that describes how density of matter changes around a specific origin. 
The obtained picture is an averaged one, but it is useful to understand 
the microscopic structure of a condensed phase system. Moreover, other 
macroscopic properties are accessible from g(r). For example, the static 
structure factor S(k) could be proved to be (see for example Ref. [Cha87], 
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p. 209) proportional to the Fourier transform of g(r): 


+00 l 
S(k)=1-— ef drg(r)ek* 


where op = N/V is the density; or the energy (E) of a system of real 
interacting particles 


3 1. pre 
(fj= SNkaT + 5Np | dr g(rJufr) , 


where u(r) is the pair interaction potential. To define g(r), it is useful to 
start from the statistical distribution F(q, p) that gives the probability to 
find the system in a point of the 6N dimensional phase space (q, p), and 
whose analytic expression in the canonical NVT ensemble is 


F(q,p) =” / (ae dqdp , (3.1) 


where H is the system Hamiltonian and 6 = 1/kgT. 

Here I remind that with symbols q and p I refer to all coordinate and 
momenta components of all N particles and with [ dq dp an integration 
in the 6N phase-space is implied; to refer to the three specific coordinates 
of the j!" particle, the symbol r; shall be adopted. 

Since coordinates and momenta are decoupled (i.e. linearly indepen- 
dent), then F(q,p) can be splitted as a product of two terms, one purely 
configurational (P(q)) and one purely momenta-dependent (®(p)): 


F(q,p) = P(q)-®(p) , 
with 
P u e-BV(q) 
(d = Fage o 


where V(q) is the potential energy. P(q) is the probability of finding 
the system in a specific point q of the 3N dimensional coordinate space 
(which, in turn, is a subspace of the complete 6N dimensional phase- 
space). The probability distribution P(q) cannot be partitioned in single 
particle terms, due to the potential V (q), but it is possible to define a pair 
distribution function as 


p/n) (t, r2) = J 


+00 +00 +00 
drz du f drnP(q) , 
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which gives the probability of finding particle 1 in rı and particle 2 in r2 
simultaneously. The contribution of the other N — 2 particles (from 3 to 
N) has been averaged through integration. Obviously, particle #1 and 2 
have no particular significance since all particles are indistinguishable, so 
we need another distribution function o N ) (11,12) that would give the 
probability of finding whatever particle in rı and whatever other particle 
in rn. There are N possible choices for the first particle and N — 1 for the 
second, so 
0?!) (ty) = N(N —1)P?%) (41,19) 


In the same way, it is possible to obtain two single-particle functions 


PON) (r4) =| 


+ oo +00 +09 
dr2 drz-- J dryP(q) , 
or“) = Ns PO/N) (r) 


Due to the fact that o(/N) = N/V, this quantity is numerically equal to 
the standard definition of particle density p. 
A pair radial distribution function can be defined as 


g(r) = g(t.) = tum) ; (3.2) 


this quantity is essential to describe the so-called “structure” of non-solid 
systems, and allows obtaining quantitative information about the solva- 
tion sphere around a specific site. ! In fact, if we choose a specific site to 
be the origin of r coordinate, we have 


n(r') = 4719 E r elrjdr , (3.3) 


which is the number of particles that lie within a cut-off distance r’ from 
the site. When 7’ +> +00 we have 


+0 
anp | re(r)dr=N , (3.4) 


with N being the total number of particles. 


1] have dropped any vector glyph for the quantity r since, in this case, it is just the 
scalar distance |r, — n|. 
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Equation 3.2 is the statistical mechanical expression for g(r). It appears 
difficult to be computed (0°?/®) (r4, r2) requires multi-dimensional integral 
calculation), but it is very easy to obtain it in MD. It is sufficient to cal- 
culate at every time-step the distances between the site and the particles 
and create an histogram of these contacts. From this sort of histograms 
we obtain the probability of finding a particle at a specific distance from 
the site of interest. In fact, for a non-continuum system eqn. 3.4 becomes 

An(r) 


Anr® g(r) Ar = —— , 
g(r) r 


which, since pọ = N/V, can be rewritten as 


An(r) An(r) V 
N) = Fare? Ar" Näm Ar 
p nr r tr r 


a much more computationally feasible definition than eqn. 3.2. 


3.1.2 What Information can be recovered from g(r)? 


The g(r) function gives averaged properties, yet they are relevant to un- 
derstand the behaviour of molecules. 

The maxima of the g(r) function show the position of the solvation 
shells, whereas the minima show the boundaries of those shells. The 
integral 


i: = g(r)dr (3.5) 


is the number of particle of the first solvation shell if rl ,, is the first min- 
imum of g(r). Usually, the molecules of the first solvation shell are those 
that most interact with the site at the origin of the r coordinate, and, if 
H-bonding is possible, the integral of eqn. 3.5 coincides with the number 
of molecules H-bonded to the site. 

The value of the g(r) minima is also very important: the more it is 
small, the less mobility there is in the system. A value of zero for the first 
minimum means that the molecules in the first and second solvation shell 
are actually separated by an impenetrable “wall”, with no interchange.” 
What type of wall am I talking about? A potential energy barrier that 
allows or impairs the molecular mobility between solvation shells. 


2This usually means that the simulation is quite far from reaching ergodic limit. 
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This view of the potential energy barrier is supported by the so-called 
“reversible work theorem” (see [Cha87], p. 202), which states that for a 
fluid ° 

year un: (3.6) 
Ir N Bors ’ 


d 
the quantity — (Zve) is the mean force acting on particles expressed 


as a function of the distance r from the origin. The integral of a mean force 
is a reversible work, so the “mean force potential” 


w(r) = kgTln g(r) (3.7) 


also expresses the reversible work to be spent in order to move a particle 
along the r coordinate. Due to the fact that the In(x) function increases 
monotonically, high values of the mean force potential w(r) correspond 
to minima in the g(r) function. Obviously, all these considerations stand 
only near the ergodic limit of simulation long enough to sample the avail- 
able phase space. 


3.1.3 Angular Distribution and H-bond Functions 


The angular distribution function g(0) is computed analogously to the 
g(r) function; however, the contacts are reported not as a function of some 
interatomic distance but as a function of a specific angle ® (defined by 
three atoms). Moreover, the increase of 0 is associated to the increase of 
solid angle values, so with g(0) the sampling bin is not constant. Thus, a 
specific normalisation function has to be employed and g(V) behaves like 


contacts(0) 


g(8) ne) 


The dependence on both the distance r and the angle d can be com- 
bined into a single function g(r, 0). One of the possible choices for such 
function is the so-called “H-bond function” Er, 0), presented in Ref. 
[PCRS03] by Pagliai et al., and here reported for sake of completeness: 


FYB (r, 8) = Alr(t))- BC) , 3.8) 


3In a crystal, the g(r) function is exactly 0 for most values of the r coordinate even at 
thermodynamic equilibrium, thus giving divergences if eqn. 3.6 is employed. 
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with 
Ip Äre- r(t))?/202) if (re —r(t)) <0 
A(r(t)) = f, (rl) > (3.9) 
and 
B(8(t)) = exp(— (Ve — 8(t))?/203) if (Ve — B(t)) < 0 (3.10) 
1 if (0. — 0(t)) > 0 


Defining the H-bond geometry schematically as A—H- - - B, as reported in 
fig 3.1, this function can “weight” the H-bond strength when the coordi- 


Ag 


Figure 3.1: An example of H-bond between a donor diol (on the left) and an 
acceptor acetonitrile (right). The H-bond distance is represented with a dashed 
line. 


nate r corresponds to the H- - - B distance and the angle # to the A-H- --B 
angle. Parameters re and 0, are the equilibrium values for r and @, re- 
spectively, and are extrapolated from the unnormalised pair distribution 
functions. Conversely, parameters o, and oy are the half width at half 
height of the first peak of those distributions. 

The F#® function has already been employed, as well as in this dis- 
sertation, to study the H-bond network of methanol [PCRS03], of halides 
in methanol [PCS05, FPCS06], and of pyridine in water [PBMM*]. In par- 
ticular, this function has been developed to correctly describe even the 
lifetime of the H-bonding in methanol [PCRS03], therefore it is a suitable 
choice to understand and probe the dynamic behaviour of protic species. 
The only drawback is that it cannot be computed “on the fly”, because it 
needs data from both the g(r) and g(0) functions to be computed. 


3.2 Electronic Structure 


Ab initio calculation allows obtaining information about the electronic 
structure, because this latter (in the form of electron density or molecular 
orbital approach) is actually optimised. 
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To quantitatively describe the electronic structure, suitable parameters 
have to be used: the simplest ones are the partial charges associated to 
atoms in molecules. The story of partitioning the total molecular charge 
in contributions due to its component atoms is quite long, and still far 
from being at the end (see for example [Bad91]). Anyway, even the simple 
Mulliken partitioning [Mul55] of charge (“Mulliken population analysis”) 
can provide useful insights on what happens to electrons in molecule, 
where they gather and from where they withdraw. 

When orbitals are constructed as combinations of plane waves, specific 
methods to localise the electronic charge have to be adopted. Two meth- 
ods to achieve this goal are the Electron Localisation Functions and the 
Wannier centres analysis, which have been both employed in this disser- 
tation and are briefly presented in the following. 


Electron Localisation Functions 


Electron Localisation Functions (ELFs) have been developed by Edge- 
combe and Becke [BE90], who proposed to consider the following func- 


tion: i 
1) = — p - (3.11) 
1+— 
D? 


The function y(r) has value close to 1 where Pauli repulsion is weaker 
than in a uniform electron gas of the same density, thus giving an high 
localisation; on the contrary, where spin coupling is stronger y(r) shall 
tend to the value of 0.5, which corresponds to that of an homogeneous 
electron gas (which, by definition, is completely non-localised). In fact, 
the functions D, and D? are the so-called “Fermi-curvatures” of a real 
system and of a uniform electron gas, respectively; Fermi-curvature is a 
gradient function of the spin density (po): 


2 
D, ~ Ve) ; 


Po 
the more Do is small, the more the electron is localised. The maxima of the 
n(r) function can be mapped onto the real space of the molecule to give 
a topological view of electronic rearrangement. Another way to look at 
the ELFs was proposed by Savin and Silvi [S594]: the ELFs represent the 
bosonic character of electrons, with pairs of antiparallel electrons showing 
some bosonic behaviour and thus having high values of y(r). 
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Wannier Centres 


Maximally localised Wannier function (MLWF) centres represent another 
method to localise and visualise electrons in molecules. They are con- 
structed by employing unitary translation operators. Translations in quan- 
tum mechanics are represented by unitary operators that commute with 
the Hamiltonian, when the translation R is compatible with the system 
symmetry. 

Wannier functions [MV97,SMVP98] can be proved to coincide with the 
Fourier transform of the Bloch orbitals $x(r) (see eqn. 2.52): 


wr(r) ~ J ake R (1) 


These functions are still not localised. To reduce the space-spread of the 
Wannier functions, it is required to minimise the mean square displace- 
ments of the expectation values of the position operators; this minimi- 
sation is permitted because it is arbitrary how to set the phase of the 
wavefunction. 

Once obtained these new Wannier functions w(r), their centres are 
determined as [MV97]: 


re ~ —S{In(wle|w)} 


The position of the centres gives a straightforward picture of the electron 
pairs distribution in molecules, and it happens to closely resemble the 
qualitatively diagrams often drawn “by hand” employing simple VSEPR 
concepts. 


3.3 Vibrations: Some Rules of Thumb 


Molecular vibrations happen in a large time and energy scale. For the aim 
of this dissertation, the vibrations are those occurring in the 1500—4000 
cm! wavenumbers region, corresponding to energies of about 0.2—0.5 
eV. In this frequency region, vibrations are often associated to fast and 
very localised molecular motions, whereas under the 1000 cm! threshold 
there are many slow and delocalised modes, such as the collective lattice 


modes in a crystal solid. 


This spectral range can be probed experimentally with a number of 
techniques, in particular IR and Raman spectroscopies. These two types 
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of spectroscopies are somewhat complementary both regarding selection 
rules and the choice of solvent (for spectroscopy in liquid media). As 
a rule of thumb, we could say that the IR activity is associated to a 
change of the molecular dipole moment caused by an appropriate vi- 
bration, whereas Raman activity is due to a change in the polarisability 
tensor. In fact, the interaction of a system of slow-moving charges and 
an electromagnetic radiation can be reduced, for classical fields, to the 
Hamiltonian of some electric dipoles with the electric field E(t): 


H(t) = F(E) E(t) + FOH VER) +... (3.12) 


where ji(t) represent the dipole moment vector and C(t) the quadrupole 
moment 2” rank tensor. Limiting ourselves to the first term, thus disre- 
garding multipoles higher than the dipole, we can see that the latter can 
be expressed as made up of a permanent (fo) and induced terms: 


> 
ji(t)=jio+ W-E) +B -E(E)E(t)+.... (3.13) 
— — 
1stinduced dipole 


The first addend determines the IR activity, while the second one is related 
to the Raman activity. The electric field can polarise the charge distribu- 
tion creating an induced dipole even in molecules without a permanent 
dipole, and this new dipole is proportional to the polarisability tensor a. 


Moreover, it is useful to rationalise molecular vibrations on the basis 
of the harmonic approximation. This approximation is a rather crude one, 
but it is simple and powerful enough to clarify a lot of features with very 
little and straightforward reasoning. 

In particular, the harmonic frequency v for a classical oscillator is ex- 


pressed as u 
vo y ; a (3.14) 


where k is the so-called force constant and u is the reduced mass of the 
system. * The force constant is the Laplacian of the potential energy of 


“The reduced mass for a two mass system mı and mp is 
_ mym 


mı + ma 
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the harmonic oscillator Uyarm: 
k = V?Uharm (3.15) 


The H-bond induces, as shown in the following chapters, anharmonic 
effects, i.e. effects that cannot be completely understood with a simple 
harmonic oscillator model and which should require a more extended 
potential Taylor expansion around the equilibrium position: 


1 1 
U(x) = Uo + zk? +V Ux +... 


Svr 
harmonic term 


Nevertheless, the harmonic oscillator model still proves useful to have 
an immediate understanding of anharmonic effects, since these latter can 
often be interpreted as due to a change in the value of the force constant 
k. This way of rationalising H-bond effects onto vibrational spectra are 
recurrent in this dissertation. 


3.4 Vibrations: Linear Response Theory 


Linear response theory is the conceptual framework to describe systems 
slightly displaced from an equilibrium state, close enough to it to allow a 
linear approximation to work. Linear response theory is closely related to 
the so called “fluctuation-dissipation theorem”, which says that the fluc- 
tuations of a system at equilibrium are related to the response of a sys- 
tem slightly perturbed out of equilibrium. Linear response theory is thus 
relevant to obtain information about vibrational spectra: to obtain exper- 
imental spectra, the system is perturbed out of equilibrium by radiation 
absorption, and to recover this effect we have to study the fluctuations 
of our simulated molecular system (which is, hopefully, close to equilib- 
rium). 


It has been shown in Section 2.2.4 that the equilibrium average of a 
quantity d(q,p) can be expressed as 


"E d(q,p)f(q,p) dqdp 


rer q,p) dqdp | 
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in a canonical ensemble NVT, to be consistent with eqn. 3.1, the density 
of states f (q, p) is 


flap) =e PHP) 


with H(q, p) representing the equilibrium Hamiltonian. 
Now let us suppose that at time t < 0 the system was not at equilib- 
rium, due to a perturbation of the type 


A=-f0, 


with f a coupling term with property 0(q,p), and that the perturbation 
is switched off at t = 0. Thus, for t > 0 the system should relax towards 
equilibrium, but at t = 0 the density f (q, p) is still 


fae PHA) , (3.17) 


Equation 3.16 can be used to calculate an instantaneous average of 0 even 
for a non-equilibrium state, which shall be indicated as B to distinguish it 
from the equilibrium average (8). At t = 0 the average 0 is 


ae BHA) dadp 


[FOR aqap ọ, —B(H+A) dqdp , 


and for t — +o we have 8 = (B). At t > 0 the expression for 8 is equal 
to the previous equation 3.18; however, it must be pointed out that in this 
case the dynamics of the 0 function is governed by the Hamiltonian H, 
and not by H + A, because the perturbation has been shut off at t = 0. The 
difference between the two averages V and (8) is due to the perturbation 
and, if this perturbation is small enough, this allows expanding the non- 
equilibrium density f of eqn. 3.17 as 


f =e PHA) — ePH(1— BA+...) 


(3.18) 


Substituting this result in eqn. 3.18 and remembering that (0) = (0(f)), it 
can be proved that 


8 = (8) — B[(A 80)) — (8) (A)] +... 


The difference between the averages 8 and (0) tell us how the system 
returns to equilibrium after the perturbation is switched off, and can be 
expressed as 


BC) — (8) = BF ([8(0) — (8)] - leE) -= (®)])_, 
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where the averaged quantity, the product of differences 
Ca,a(t) = (1800) — (8)] - (8) — 8) (3.19) 


is called “autocorrelation function” and plays a pivotal role in obtaining 
time-dependent information from MD simulations. 

The autocorrelation function presented in eqn. 3.19 can be rewritten 
as 


C(t)3, = (8(0) B(t)) — (8)? 


and has many interesting properties; for example, near time t = 0 we have 
that 


C(~ 0)o,0 = ([8(0) — (9°) , 


whereas at large times the fluctuations become uncorrelated with the orig- 
inal value (0) and, as t — +00 we eventually have that C(t) 99 = 0. 

The autocorrelation function is easily accessible by MD simulations, 
since it can be calculated by substituting the ensemble with the time aver- 
age as in eqn. 2.39: 


tmax 
Cya(t) = -f dr A(t) ATi) 


Emax 


3.5 Vibrations: Spectra from MD Simulations 


3.5.1 Vibrational Density of States 


The vibrational density of states (VDoS) can be seen as representing the 
collection of all possible vibrations in a molecular system. It is a quan- 
tity accessible to inelastic neutron scattering and, in principle, contains 
all information also present in IR and Raman vibrational spectra. It is 
particularly simple to be calculated from MD simulations, exploiting the 
autocorrelation function formalism. 

Let us assume that the molecular system is made up of a collection of 
(classical) harmonic oscillators; thus, the coordinates of the it atom can 
be written as 

xilt) = A;cos(w;t + i) r (3.20) 
which can be differentiated with respect to time to obtain 
d ; 
vilt) = gH) = Aw; sin (wjt + i) F 
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here the constants are not very important, so we can limit to 
v;(t) ~ sin(w;t + i) 7 (3.21) 
and the velocity autocorrelation function takes the form 


Coo(t) = } (0: (0)v:(t)) ~ Di (sin (gi) sin(wit + 9i)) 


i i 


Exploiting the equality 


sina -sinb = =| cos(a — b) — cos(a + b) | 


NI = 


we obtain 
Coolt) ~ } (cos(w;t) — cos(w;t + 29;)) 


1 
The second term can be proved to be null, provided that phases ¢; are 
random distributed (which is true at thermodynamic equilibrium), due to 
the following equality 


cos(a +b) = cosacos b — sinasinb , 


so that 
27 


dọ cos(wt +2) = m dọ [cos(wt) cos(2¢) — sin (wt) sin(2¢)] =0 . 


Considering the only surviving term, we eventually arrive to 


Colt) ~ } (cos(wi;t)) . (3.22) 
i 
If we define T (w) as the Fourier transform of Cw(t) 
+00 
ries / Col tei" dt (3.23) 


and consider the time average by following the ergodic principle, like in 
eqn. 2.39, then eqn. 3.22 can be rewritten as an integral in frequency space 


Colt) Ef," Tw) coswit) dar, 


where the integral corresponds to the usual definition of real space Fourier 
transform. In this way, it becomes evident that T (w) represents the num- 
ber of oscillators in the system of frequency w, i.e. the quantity T (w) is 
the density of vibrational states. 
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The VDoS is a collective property of the system, and can be calculated 
as the Fourier transform of the velocity autocorrelation function as in eqn. 
3.23, because single-particle displacements can be viewed as superimpo- 
sition of all normal modes of the system. It has to be pointed out that the 
complete VDoS calculated via eqn. 3.23 should match the sum of all nor- 
mal modes of the system, but VDoS of specific pairs of atoms (i.e. T(w;)) 
do not necessarily match any given normal mode. 

In this dissertation, the VDoS of the various cases of study has been 
calculated with a more direct formula than eqn. 3.23, based on direct 
Fourier transform of the coordinate displacements. This should not be 
surprising, since the analytic expressions of the displacements x(t) in eqn. 
3.20 and of velocities v(t) in eqn. 3.21 are exactly the same, but for a 
phase factor to convert sin into cos. Moreover, in Ref. [Tuc10b], p. 546, it is 
shown that quantum and classical autocorrelation functions for a system 
of harmonic oscillators are directly proportional: 


hw 
C(t) quantum = Phw tanh (=) Case 


3.5.2 IR and Raman Spectra 


As shown in eqn. 3.12 and 3.13, the first term determining the interaction 
between matter and light is 


H(t) > Holt) - E(t) 


and it determines the IR spectrum. In Ref. [Tuc10b], at page 547, it is 
shown that, just like the VDoS can be calculated as Fourier transform of 
the autocorrelation function of the displacements, the IR activity can be 
calculated as the Fourier transform of the autocorrelation function of the 
molecular dipole moment (C,,,(t)): 


IR(w) = = tanh (FF) / : die Cult) (3.24) 
As in the case of VDoS calculation, this results is usually exploited by 
adopting the classical dipole moment autocorrelation function. 

In the present dissertation, I have observed that for high frequency 
vibrations (O—D and O-H stretchings) there is very little difference be- 
tween VDoS and IR spectra. Calculating IR spectra from ab initio MD tra- 
jectories is significantly more computationally expensive than calculating 
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VDoS, since methods based on MLWF centres have to be employed to lo- 
calise electrons, in order to calculate reliable dipole moments. In practice, 
after having collected a trajectory, a new calculation has to be performed 
to optimise the wavefunctions and employ MLWF centres analysis. 

Calculating Raman spectra requires even more computation, due to 
the fact that it is necessary to calculate not simply the molecular dipoles 
but also the six independent components of the polarisability tensor. I 
have not performed such calculations for this dissertation, but details 
about the implementation of Raman calculation in CPMD [CPM] can be 
found in Ref. [PCC 08]. 


3.5.3 Time-Sampling and Spectra: Part I 


I find appropriate to conclude this methodological chapter with some brief 
remarks about time-sampling and how it affects the computed spectra. 

In fact, the more the time-spacing is small (i.e. the more the time- 
sampling is dense), the higher is the maximum accessible frequency ob- 
tainable by means of eqn. 3.23. Moreover, the longer is the MD simulation, 
the higher is the resolution of the computed spectrum. This is obvious: to 
sample high frequencies, a short time-interval sampling is needed. And, 
to improve resolution, repeated sampling has to be carried out. 

A more quantitative discussion about these issues shall be presented 
in Section 4.1.1. To quantify these statements here and now, I pinpoint 
that even a huge time-step of ~3 femtoseconds provides a maximum ac- 
cessible frequency of ~5500 cm~!, and even a not-so-long trajectory of 
~17.5 picoseconds gives a frequency resolution of less than 2 cm! (see 
Ref. [MMPCR12a]). 
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In this Chapter a quite deep discussion about Fourier and wavelet trans- 
forms shall be presented. Fourier transforms have already been exploited 
in eqn.2.50 as well as in sections 3.5.1 and 3.5.2, but here they will be 
discussed from a more general point of view in order to introduce time- 
frequency analysis. 

What is time-frequency analysis? We could say that music is concep- 
tually closely related to it: sheet music, something that every musician is 
used to read, associates a specific acoustic frequency (note) to the specific 
instant when it should be played. In time-frequency analysis correlation 
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Figure 4.1: Sheet music. The first 9 measures of the Canon in D of baroque 
composer Johann Pachelbel. 


plots similar to sheet music are obtained. 

There are various mathematical “tools” to perform time-frequency anal- 
ysis, and the discussion shall be devoted mainly to short-time Fourier 
transform and wavelet analysis. Actually, the former is didactically help- 
ful, the latter is much more accurate and useful. 


4.1 Fourier Transform 


The most used “tool” to obtain frequency information from a time-resolved 
signal is probably the Fourier transform (FT). There are many ways to in- 
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troduce this type of transform; in my humble opinion, adopting a quantum- 

mechanical formalism is the closer to a physical-chemist point-of-view. 
Position eigenstates |x) can be used as a base for the state ket |$) 

producing the usual definition of wavefunction (already seen in eqn.2.9) 


p(x) = (x16) , (4.1) 
and of its complex conjugate (already seen in eqn.2.10) 
p(x)" = (|x) 


In order to have a momentum-space wavefunction, we have to change the 
basis set employing the £*(IR) Hilbert space closure relation 


=f dplp)ipl > (4.2) 


where the symbol I is used to represent the identity operator that has the 
property 
(plil) = (Bild) 


This remarkable property can be exploited in eqn.4.1 to obtain: 


(x) = (alo) = (elle) = (al [ap ip)(pl] lo) = fap Gp) (ple) . 


(4.3) 
The latter equation can be put into the following form 
+00 
P(x) = j dpelx)plp) , (4.4) 


where the coefficient of the expansion c(x) is 


c(x) = (xp) 


The c(x) coefficient is the projection of a momentum eigenket into co- 
ordinate space: in fact, the momentum eigenket satisfies the following 
eigenvalue equation 

Pip) = plp) , 
which can itself be projected into momentum space with the usual opera- 
tor substitution 


A „0 
Po-ih 
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obtaining 
0 
hl) = pl) - (4.5) 
Equation 4.5 has as solution ! 
(x|p) = et 


which enables us to rewrite eqn. 4.4 as 


p(x) = IM dpet™P*p(p) . (4.6) 


By remembering that, if (x|p) = e*""P*, then (p|x) = (x|p)* = et""P*, 
the ¢(p) = (p|®) function can be expanded as 
+00 P 
o(p) =f dee pa) (4.7) 


— 00 


Defining p = hk and k = p/ħ, equations 4.6 and 4.7 become 


p) = f dxe™9(x) = Flow] , 
p= f dket™9(k) = F7] 


foe} 


(4.8) 


The pair of equations 4.8 defines the FT and its inverse. Moreover, because 
in the £? (R) Hilbert space we have (in analogy with eqn. 4.1) 


p(k) = (klp) , 
then the F-transform can be defined also as inner product: 
FR] = kl) - (4.9) 


Thus, the FT can be reduced to a “tool” that changes the vector basis 
and shifts information from coordinate to reciprocal space. Furthermore, 
the actual discussion has been made possible by the fact that coordinates 


The complex conjugate ((x|p))* = (p|x) = e7"P* is too a solution of eqn. 4.5, and in 
literature there is ambiguity regarding the sign of the exponent of the FT. In this disser- 
tation the minus sign for the FT and the plus sign for the inverse have been consistently 
adopted . 
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and momenta are “conjugate variables” (i.e. [%,p] # 0) and their uncer- 
tainties are related by the Heinseberg principle: 


h 
AxAp2= ; 
rap Ag 


this is a very relevant assumption because it enables to extend this way 
of reasoning to any other pairs of conjugated variables related by a sim- 
ilar uncertainty relationship. In fact, we have similar relationship for the 
energy-time pair: ? 

h 

AtAE > 5 


Remembering that AE = hAw, the latter becomes 


AtAw > (4.10) 


NI = 


Thus, one can “jump” from time to frequency space adopting Fourier 
analysis. 


4.1.1 Time-Sampling and Spectra: Part II 


Inequality 4.10 is very important because it allows a deep discussion about 
time and frequency relationship. I have already anticipated that the time- 
sampling affects the power spectrum (i.e. the FT): this happens because 
of inequality 4.10. Here with Ax I refer to uncertainty in the x property, 
whereas with N I refer to the number of time-step At, i.e. the length of the 
signal expressed in units At. 

Obviously, we have that 


tmax =N-At , (4.11) 


and, moreover, 
Wmax = N -Aw . (4.12) 


With At fixed, increasing tmay has the effect of increasing N (eqn. 4.11). 
When N increases, since Aw œ 1/N (eqn. 4.12), the frequency uncer- 
tainty Aw shrinks, thus the actual spectral resolution increases. So, from a 


2However, I have to point out that time is not an observable in non-relativistic quan- 
tum mechanics, but a parameter. 
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longer MD simulation, we can obtain a vibrational spectrum with a higher 
resolution. In fact, it can be proved that 


1 
A 1) = —_______ 
an) tmax(sec) - c(cm/sec) 
with c being the speed of light in vacuum. By using this equation one can 
obtain the numbers anticipated in Section 3.5.3. 
Moreover, since N = tmax/At (eqn. 4.11), inserting it in eqn. 4.12 we 
have 


Wmax X WZ , 


At 
proving that smaller time steps allow reaching higher frequencies This 
explains why adopting smaller time-steps allow obtaining a larger spectral 
window. 


4.1.2 What the Fourier Transform Just Cannot Do 


Fourier transform can extract the power spectrum of a time-resolved sig- 
nal, but it has some limitations. 

First of all, very similar signals can have a very similar Fourier spec- 
trum ? , as pictured in Ref. [MKO93] and also shown in fig.4.2. As can be 
seen, the f(t) abruptly changes frequency of oscillation in the middle of 
the time axis; on the contrary, the g(t) signal always shows two frequen- 
cies simultaneously. They are two very different signals, yet their Fourier 
transforms are almost indistinguishables because they are characterised 
by the same pair of frequencies. 

Moreover, Fourier transform cannot be used to analyse signals “on the 
fly”, Le. those not completely decayed to zero. This is due to the fact 
that the Fourier time-window spans from —co to +00, and a signal has to 
decay to zero by a finite time for a reliable Fourier transformation. The 
time-window of the FT is so large because the function that gives the size 


3Here I have to specify that, unless explicitly stated otherwise, every plotted FT of 
some function f(t), actually represents the square modulus of the said FT, i.e. 


+00 f +00 ; 
Fr] ` F* fo] = f dt f (t)e™ it è N dt f (te 
Moreover, every plotted FT reported here is obtained by the so called “Fast Fourier Trans- 


form” algorithm (FFT) adopting the free fftw package [FJ97, FJ98, JF08] or the ESSL pack- 
age developed by IBM. 
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Figure 4.2: Two time-dependent signals f(t) (panel a) and g(t) (panel c) and 
their respective Fourier transforms (panels b and d). 


of this window is a plane wave, which, as already discussed, has its square 
modulus equal to 1 over all its domain (which happens to be all R). To 
overcome this issue, the so-called short-time Fourier transform has been 
introduced, defined by 


which limits the integration just to a finite time-window. The basic idea 
is that the computation is repeated in various time-windows of length 
2T in order to sample the entire signal. Notwithstanding this, a similar 
approach is affected by a very serious issue, called “aliasing”: in fact trun- 
cating the time series at the edge of the finite time-interval introduces into 
the power spectrum spurious high frequency signals. These signals alter 
the overall appearance of the spectrum and are seriously misleading. 
Furthermore, there is the problem that a very compact and well re- 
solved time-signal has its power spectrum spread over a very large in- 
terval of frequencies. This can be appreciated in fig. 4.3, where the FT 
of a Dirac’s delta function (panel a) is a completely delocalised function 
(panel b), whereas the transform of a less time-localised function (panel c) 
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Figure 4.3: Two time-dependent signals f(t) (panel a) and g(t) (panel b) and 
their respective Fourier transforms (panels b and d. 


is better localised in the frequency space (panel c). This can be considered 
a consequence of the time-frequency uncertainty relation (eqn. 4.10): a 
signal well localised in time (the Dirac’s delta is the best example of this 
type of function) contributes to all possible frequencies of the spectrum. 
To overcome these limitations and obtain good time-frequency local- 


isations, many approaches have been developed, mainly based on the 
short-time Fourier transforms. 


4.2 Wavelet Transform 


FT is usually employed to extract the frequency content from a time- 
dependent signal, without a simultaneous localisation in both frequency 
and time domain. In order to obtain vibrational properties from MD tra- 
jectories, Fourier transforms are performed, [Tuc10b,BP76,D. 00] but it has 
been recently shown [MSC08a, MSC08b, RW05] that similar results can be 
obtained with a wavelet analysis [C. 92] approach. 


We have seen that FT can be interpreted as the inner product (eqn. 4.9) 
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of the state vector onto a specific basis of plane waves (w|: 


+00 
Fip(t)] = (wl) = |, ateg) 


The wavelet transform (WT) can be defined in an analogous way as the 
projection of the state vector |) (which belongs to £7(IR)) onto a basis of 
(Ws,r| functions: 


1 1 pe 
Wiel] = cle) =f atv) + 19) 


the w(t) function takes the name of “wavelet” and the range of t values 
where Ys z(t) # 0 represents the sampling window of the transform. The 
wavelet depends on two real parameters s and t whose meaning shall 
soon be explained, and these tune the span of the wavelet. 

The wavelet y,;-(t) can be obtained in the following way 


S 


plt) = € = =) (4.14) 


from a so-called “mother wavelet” p(t), which also has to satisfy the two 
following relationships 


[7 aty(y =0 


< © 


The former relation implies that y(t) should really be a wave, with an 
oscillatory behaviour, whereas the latter implies that y(t) has no power 
spectrum around w = 0. 
It can be proved (see Ref. [C. 92], pag 61) that the time-interval sampled 
by the WT is 
[T + ste — SA4, T + ste +84] , 


where t. is the time corresponding to the centre of the wavelet and A; is its 
radius. In the same way can be proved that the corresponding frequency 
interval is i í 
w w 
— A; 7 — + Au 1 
Ss s Ss s 


where w. is the centre of F|y(t)] and A, its radius. It can be seen that 
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Fourier transform Short-time FT Wavelet transform 
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—— 


Figure 4.4: Schematic representation of the windows for the time sampling of 
Fourier, short time Fourier and wavelet transforms. On the x-axis is reported the 
time, on the y-axis the frequencies. 


the time interval expands itself when s increases, whereas the contrary 
could be said for the frequency interval. Moreover, the T parameter has 
the effect of translating the time window. Thus, while for the FT the time- 
sampling window spans all the time axis, for the WT this window has a 
rectangular shape in the time-frequency space and is tuned by the pair of 
s and T parameters, as pictorially illustrated by fig. 4.4. 

The sampling of the FT prevents a time localisation of the frequencies, 
whereas the sampling of the WT has an optimal time-frequency localisa- 
tion capacity that, moreover, can be tuned changing some parameters to 
obtain the desired time-frequency accuracy. The time-frequency accuracy 
is, in fact, the result of a trade-off between time and frequency accuracy, 
respectively, due to a Heisenberg-like uncertainty theorem that stands true 
for WT too [C. 92]. Please notice that to extract high frequencies one uses 
a small time-window, which is intuitively reasonable since to sample high 
frequency oscillation just a small time-interval is needed; on the contrary, 
to extract low frequencies it is necessary to sample a larger time-interval 
to catch all the oscillations there. This a sort of “smart” sampling that 
short time Fourier transform just cannot do, since in that case the time- 
frequency sampling boxes are fixed once and for all (see fig 4.4, central 
panel). 

Obviously, the results obtained with WT are very dependent on the 
choice of the functional form of the mother wavelet. One of the most 
employed mother wavelets is the so-called Morlet-Gabor (MG) function 
[CHT98], which is basically the product of a plane wave by a Gaussian 


1 r 
p(t) = en wi (4.15) 


and whose plot is reported in fig. 4.5 along with those of its real and 
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imaginary parts. 
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Figure 4.5: Morlet-Gabor mother wavelet. 


In the literature other types of mother wavelet are also used: two of 
them are the Paul function of order m, defined as 
Pie tae 


POS Faeyr’ 


and the DOG (“derivative of a Gaussian”) functions of order k 


(—1)"+1 q™ 


eee es eee ae 
/T (m+ 1/2) dt" 


p(t) 


Unfortunately, spectrograms obtained with those latter wavelets do not 
have a time-frequency resolution suitable to MD trajectory analysis [MMH04]: 
the Paul wavelet gives a spectrogram with poor accuracy in both time and 
frequency, the DOG wavelet gives discontinuous spectrograms that, more- 
over, are difficult to understand due to the presence of too many Fourier 
components. In fact, in Ref. [Kir05] it is shown that the MG wavelet best 
reproduces the Fourier spectrum. In this dissertation, as well as in Refer- 
ences [RW05, MSC08a, MSC08b], only the MG function is adopted. 
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4.2.1 Explanations 


The question is: how does the WT works? It is not so intuitive, in par- 
ticular the meaning of all those parameters (s, T, 7) appearing in the WT 
and into the wavelet function (wo). This Section shall try to answer this 
question. 

In my opinion, the best way to understand the meaning of the s pa- 
rameter is due to Meyers et al. [MKO93] that proposed to calculate the 
WT of a function f(t) whose Fourier power spectrum was known, and 
searched for the value of s that maximises the wavelet power spectrum. 
Adopting the MG function as mother wavelet 


p(t) ~ elote—P/20° 


it is possible to prove that the s parameter and the A Fourier wavelength 
are related by 


wo + (2+?) 
Art d 


which can be expressed in term of frequency w = A~! as 


wo + 4/ (2 + w3) 
w= — Ho. (4.16) 
This latter equation induces also the choice for the wọ real parameter: in 
fact, if we put wo = 271 (as done in all wavelet computations of this disser- 
tation), eqn. 4.16 reduces to the simple equation w ~ 1.01/s. Therefore, 
it is a good approximation to think of 1/s as the actual frequency of the 
signal. 

All this mathematical formulation can also be understood qualitatively. 
The s parameter stretches and shrink the mother wavelet p(t) to generate 
the wavelet basis p((t — T)/s). But, as already discussed about fig. 4.4, 
the value of s affects also the time-frequency interval used to sample the 
time-dependent signal, in such a way that lower frequencies are obtained 
when s increases, thus displaying the inverse relationship between these 
two quantities, as explained quantitatively by eqn. 4.16. 

The meaning of the parameter wo is also obvious due to eqn. 4.15: 
wo is the main sampling frequency of the mother wavelet that acts as a 
window function. This means that WT only samples the wo frequency? 
No, because the function is stretched both in time and frequency space 
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by s, as previously discussed, and the real sampling frequency of the 
“daughter” wavelets p((t — T)/s) is given by eqn. 4.16. 

Now also the meaning of the T parameter should be clear: whereas 
s stretches and shrinks the time-frequency interval, T translates in time 
the wavelets without affecting their shape. It is T that allows obtaining 
a time-resolved spectrum; T, in fact, has the same dimension of the time 
variable t. 

The value of o in the exponent of the MG wavelet (see eqn. 4.15) has 
a slightly more complex effect onto the spectrum. Basically, o tunes the 
width of the Gaussian window of the MG function, and, thus, fixes the 
time-interval used to sample the signal. As has been already said, the 
s parameter shall stretch and shrink this interval, but starting from the 
default interval set by ø. Thus, a large value of sigma produces time- 
resolved spectra that have a very low time-resolution, albeit with optimal 
frequency accuracy. This is easily understandable, since the larger is the 
time-sampling interval, the more the spectrum obtained with WT becomes 
similar to that obtained with FT and the time-resolving capacity of the 
former is compromised. On the contrary, with smaller value for o the time 
localisation capacity of the WT increases, but decreasing the frequency 
accuracy. Another way to picture this behaviour is the following: the MG 
wavelets obtained from eqn. 4.15 represent the sampling windows in the 
time-space; anyway, the sampling windows in frequency space are their 
Fourier transforms 4 


FIPO] = Bw) ner; (4.17) 


thus, it is evident that the frequency interval spanned by (w) decreases 
with the increase of g, allowing a better frequency-sampling. 


4.2.2 Some Examples 


The WT can be used both to reproduce smoothed power spectra (this is 
one of the most employed applications of the WT in chemistry, see for 
example Refs. [KJ09, AWW *97,JDMN00,UP06, Ehr02]) and to obtain time- 
frequency spectrograms. Here I present a couple of examples and graphs 
to visualise both these possibilities provided by WT. 


4Remember from Section 2.1.5: the FT of a Gaussian function is still a Gaussian func- 
tion. 
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Figure 4.6: a) the test function defined in Eq. 4.18; b) its Fourier power spec- 
trum; c) its wavelet spectrogram projected on the frequency axis (wavelet power 
spectrum). A value of 16 for the o parameter has been adopted for a better 
frequency localisation. 


The test function is a cosine oscillating at three frequencies, modulated 
by Gaussian windows: 


f(t) =cos(3wt)e t) / T4 
os t/t as 


cos(wt)e 0/7 , 


with t = 10%, w = 0.0025 and t3 > ta > t4. In fig. 4.6 a comparison be- 
tween Fourier and wavelet trasforms of a test signal (a panel) is reported. 
The Fourier power spectrum and the wavelet spectrogram projected on 
the frequency axis (wavelet power spectrum) of f (t) are reported in panel 
b and c of fig. 4.6, respectively. The wavelet transforms not only are able 
to reproduce with good accuracy the Fourier power spectrum, but they 
also give spectrograms like the one in fig. 4.7, localising the test signal in 
both time and frequency. This correlation plots is conceptually similar to 
sheet music (see fig. 4.1). 
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Figure 4.7: Wavelet spectrogram of the test function defined in Eq. 4.18. A value 
of 1 for the 7 parameter has been adopted for a better time localisation. 


In figures 4.6 and 4.7 I have used two different values for o: a relatively 
huge value for the former, since in that case I wanted to reproduce the FT 
power spectrum, and a very small value for the latter, because I wanted 
to show the accuracy in time localisation of the three-frequency signal. To 
discuss the effects of ø, in fig. 4.8 the FFT of a signal coming from ab 
initio MD simulation (see Ref. [MMPCS11]) is compared with the WT at 
various values of the 7 parameter, showing details of the band structure at 
increasing resolution. The WT appears to correctly reconstruct the Fourier 
spectrum with high values of parameter ø, showing that the truncation in 
the GM wavelet does not significantly affects the calculations. Moreover, 
it is also apparent why WT is frequently adopted for smoothing spectra 
[SLC03, Ehr02,SSZ*06]: for low values of 7, WT acts as a low-pass filter 
that removes high-frequency noise. ° 

For a test function like that of eqn. 4.18, the trend of the width at 
half-height (T) of its 1” frequency peak can be plotted with changing of v. 
A roughly hyperbolic dependence of I on o is obtained, as shown in fig. 
4.9, meaning that the spectrum becomes more and more blurred with the 
decrease of 7. On the contrary, for a signal of this type (I remember that 


5It has to be pointed out, however, that when WT is used only to smooth spectra, 
function such as the Haar wavelet [Haal0] and not the MG wavelet (whose primary use 
is time-frequency localisation) are usually employed as mother wavelets. 
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Figure 4.8: Fourier (blue) and wavelet power spectra (violet) of the displacement 
of inter-molecular Aro...p(t) function of site #2 of ethylene glycol [MMPCS11] 
with changing of the value of o. 
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Figure 4.9: Spread of the wavelet spectrum as a function of the 7 parameter. 
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this is an “artificial” signal, meaning that the analytic form for the input 
function is known and is reported in eqn. 4.18) the frequency-accuracy 
does not improve significantly for values of o higher than ~10. 


4.2.3 Implementations 


WT can be implemented in practice in at least two ways. There is a direct 
and straightforward approach, which basically consists in applying the 
following equation 


t—T 
S 


1 Lyre 
Wle(t)| = — = — il dt y* t 4.19 
= sso) = [dew (S*) 0) aay 
and calculate many time this integral for a lot of values of sand T in order 
to map all the time and frequency space spanned by the input function 
p(t) and its Fourier transform p(w), respectively. Otherwise, it is possible 
to calculate the bra(c)ket 


(Psl) 


into the Fourier space. Since I have employed both methods in my re- 
search activity, I am going to briefly discuss them. 


Direct Space Implementation 


To uniform the notation to what can be found in literature, I adopt now the 
formalism of Torrence and Compo [TC98]. With this formalism, equations 
4.13 and 4.19 can be rewritten as 


Wens) = 512 [ "ar FRA), (4.20) 


where Yn, is the wavelet and n plays the role of the time-shifting param- 
eter a in the previous equations. The discretised expression of eqn. 4.20 is 
(Ref. [TC98]): 


N-1 '_n). 
W(n,s) =s" $ f(n! -öt)y* aS (4.21) 


n'=0 à 


In eqn. 4.21, the product n’ - ôt represents the total time at the time-step # 
n’ of the MD and localises the signal in time. Thus, the wavelet transform 
W(n,s) gives the frequency content of the signal f(t) over a Gaussian 
time-window centred at n - ôt. Obviously, here n, n’, N (the total number 
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of time-steps) are all natural numbers (hence the symbol change) and ôt, 
being the MD time-step, has a finite non-zero value. The algorithm calcu- 
lates the previous sum and looks for the value that maximises the modu- 
lus |W (n, s)|? of the WT at a given time-step n’. The corresponding value 
of 1/s is taken as the “instantaneous frequency”. A cutoff of + 30 from 
the maximum of the GM wavelet function has been used to allow retain- 
ing more than 99.7% of the power spectrum energy and save computation 
time. The cutoff does not significantly affect the spectra [MMPCS11]. 

This “brute force approach” (i.e. just calculate the sum of eqn. 4.21) 
has the advantage of being relatively easy to understand and implement, 
but the resulting algorithm is very slow and memory consuming. Even 
adopting the + 30 cutoff for the time-sampling of the signal, it requires 
a lot of calculation. The main issue is given by the fact that variables 
(indexes in a Fortran90 program) n, n’ and s have to be changed indepen- 
dently and the results stored into memory. Thus, the first version of this 
algorithm that I wrote was not able to run on a normal PC, if not for input 
function up to N = 5 - 10* points (time-step). Since ab initio MD produces 
trajectories with hundreds of thousands time-snapshots, the code had to 
be parallelised ® by adopting the MPI [GLS97] paradigm (and library of the 
same name) and run on a IBM power6 workstation. Typical calculations 
on this machine could last many days. This arrangement is less than sat- 
isfactory, since the computing time of the power6 workstation available at 
the Chemistry Dept. of the University of Florence is used for many other 
calculations. 


Fourier Space Implementation 


Having assessed that wavelets could be used to study ab initio MD trajec- 
tories [MMPCS11], I have developed 7 another code that computes WT in 
Fourier space and (partially) overcomes some of the shortcomings of the 
“brute force approach” described before. 


67 acknowledge the invaluable help of Prof. Gianni Cardini (Chemistry Dept., Univer- 
sity of Florence) for this task. 

7i acknowledge the help of Dr. M. Pagliai (Chemistry Dept., Univ.of Florence) in 
developing my first version of the Fourier-space WT code. I also acknowledge Torrence 
and Compo [TC98] for providing an old Fortran77 code that I have used as a reference 
to develop my own. The code of Torrence and Compo is available free of charge via the 
internet at http: //paos.colorado.edu/research/wavelets/software.htnml. 
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Figure 4.10: On the left, the IBM power6 workstation used to run the direct-space 
WT code is displayed: it has 12 so-called “blades” (i.e. computing node), each of 
them equipped with 4 cores and 8 GB of RAM. On the right there is a photo of 
the AthlonII-x4 PC (equipped with 4 GB of RAM) that I currently use to run 
the Fourier-space implementation of the code. 


The algorithm adopted in subsequent research activity computes the 
WT of eqn.4.21 in frequency space as 


gi/2 = N) * (sor)e )erornöt , (4.22) 


where k is the frequency index, w, is the angular frequency and fk and » 
are the Fourier transforms of the time series f„ and of the adopted mother 
wavelet y, respectively. 

Eqn. 4.22 can be better understood if put in it continuous form 


Wu) = [deo et ogo), 
which can be put in a more succinct way as 
Wals) = SPF fo) piso]; (4.23) 
this latter equation means that the WT of the signal f(t) can be obtained 


computing the inverse FT of the product of the FT of the complex conju- 
gate of wavelet p} ,(t) and of the signal f(t) itself. In fact, defining the 
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convolution product of two function a(t) and b(t) as 


+00 
a(t)«b(t) = | drambt-n) , (4.24) 
and remembering the convolution-Parseval theorem of the Fourier analy- 
sis 


Farb] =F la] x Fib] , 


we immediately obtain that eqn. 4.24 can be explicitly expressed using 
Fourier transforms and their inverse: 


a(t) * b(t) =F 4[F[f| x Fig]. (4.25) 


Following definition 4.24, it is evident that WT (eqn. 4.20) can be written 
as a convolution product 


Wa(s) = s-1/7F(t) x p*(t/s) 


if we put y = n/s. Deriving eqn. 4.23 from the property of convolution 
product expressed by eqn. 4.25 is therefore straightforward. 

The algorithm based upon eqn. 4.22 is a little more difficult to be 
understood, but it is much more faster and computationally feasible than 
the “brute force approach”. This is mainly due to two reasons. The first 
one is that in the Fortran90 code eqn. 4.22 can be implemented using just 
two variables (indexes) n and s, since the w integration is demanded to 
the FFT routines. This saves both memory and computational time. The 
second one is that FFT routines, as already hinted, are nowadays truly fast 
(as their name implies) and computationally optimised. 
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Case Studies 


In the following Part, details of simulation and analysis of hydrogen- 
bonded systems are reported. These are research case studies, and I have 
contributed to them at various levels. Here I present a summary of my 
contributions, explicitly stating what my contribution actually was and 
what was done by others. 


Regarding the study of the dipeptide dynamics in a phospholipid 
membrane (Chapter 5), I mainly devoted myself to analyse the classical 
molecular dynamics simulation to extract structural properties and corre- 
lation function to probe the H-bonding behaviour. I did not perform the 
molecular dynamics simulation nor the experiments. I have performed 
the analysis of the simulated data, which has also been used to check the 
reliability of the simulation protocol, and this induced in turn to perform 
new simulations to rule out some possible sources of errors. I wrote part 
of the supplementary information of Ref. [VCMMR11] (mainly regarding 
the hydrogen-bonding dynamics), as well as creating most of the figures 
for the main paper. 


I performed the calculation and analysis of data regarding propane- 
diol and ethylene glycol in water (Chapter 6), and I developed the orig- 
inal direct space wavelet code to study their dynamics. Subsequently, 
I decided to study the propanediol dynamics in acetonitrile for a more 
straightforward comparison with experimental data. The research regard- 
ing propanediol in acetonitrile was designed and performed by me, with 
the exception of the experimental work. I also wrote the draft of the arti- 
cles [MMPCS11, MMPCR12b]. 


In the study about methyl acetate in water and methanol (Chapter 7), 
I performed the wavelet analysis of the CPMD trajectory. I did not per- 
form the main CPMD simulation. I developed the computational scheme 
adopted to correlate frequency with structural data. I also wrote some 
parts of the papers (Ref.s [PMMC*+10, PMMC*11]), in particular those 
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regarding the adopted wavelet protocols. This research was partially de- 
signed by me. 

Regarding the behaviour of thiazole in water (Chapter 8) I performed 
wavelet analysis on the simulated data. I did not perform the main CPMD 
simulation. I performed ab initio static calculations to obtain information 
about charge transfer occurring during the simulated run. I wrote part 
of the paper (Ref. [MMPMMS11]) and large parts of the supplementary 
information, as well as creating most of the figures regarding the analysis 
of data. 

Finally, I decided to investigate the highly unusual behaviour of one 
of the few example of a stable bifurcated H-bond in lithium nitrate trihy- 
drate (Chapter 9). I performed the calculations and the analysis of data 
regarding this system. The research was designed by me and I also wrote 
the draft of the corresponding article (Ref. [MMPCR12a]). 


Part of this thesis include research work already published in the pre- 
viously mentioned papers [VCMMR11,MMPCS11,MMPCR12b,PMMC*10, 
PMMC*11,MMPMMS11, MMPCR12a]. In particular, some paragraphs of 
the following Part of this dissertation, as well as most of the subsequent 
figures, are taken from the drafts of the aforementioned papers of which 
I am either first author or co-author. 
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5. Anchor Dipeptide into Phospholipid Bilayer 


Linear and non-linear spectroscopic methods with classical MD simula- 
tions and (static) quantum mechanical calculations have been combined to 
probe the structural features of a simple dipeptide, namely N-myristoylated- 
methyl-glycine 

(MrG), interacting with a phospholipid membrane fragments (namely 1- 
palmitoyl-2-linoleyl phosphatidylcholine, a.k.a. PLPC); both of them are 
schematically represented in fig. 5.1. This artificial dipeptide has been 
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Figure 5.1: Schematic representation of PLPC phospholipid and MrG dipeptide 
molecules. MrG has been marked using 13C in the shown position. Two C=O 
groups of PLPC are labelled as sn-1 and sn-2. 


designed to mimic the natural anchor that can be found in a variety of 
membrane associated polypeptides [JO90]. 

With its myristoyl tail, MrG represents a typical example of proteins 
that associate by means of hydrophobic tails (the anchors) attached to cer- 
tain conservative structural segments of the polypeptide [JO90]. In fact, 
anchoring often provides an opportunity for a stable, specific and (rela- 
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tively) controllable form of interaction of the polypeptide with the mem- 
brane surface (of certain composition and fluidity) on the time scale of 
cellular metabolism. Thus, a deeper understanding of the details of this 
peptide-membrane interaction at the molecular level, beside the general 
relevance of learning about the effective anchoring of proteins to phos- 
pholipid bilayers, recently became an important step in the realisation of 
novel anticancer remedies [dip06]. As a matter of fact, in the continuous 
search for new clinical treatments for cancer, small polypeptides can be 
employed as subjects in drug design studies [ACAR96]. Even nowadays, 
the selection process of substances for medical applications continues to 
be based essentially on clinical evaluations and tests that are mostly phe- 
nomenological in character. Moreover, the knowledge of their structural 
properties is often very limited, even for certified drugs of this type ad- 
mitted to clinical practice, because, when injected into blood plasma, the 
molecule experiences a variety of different environments and can even be 
subject to partitioning into the phospholipid membrane. 


Optical spectroscopy allows identifying the structural properties of 
molecules in these latter environments. Other experimental tools, such as 
calorimetric and electrochemical techniques, cannot to help in the identi- 
fication of the relative arrangement of molecular moieties on a local scale, 
and X-ray and electron scattering experiments are not suitable to probe 
disordered system like the membrane samples in physiologic conditions. 
In fact, in membrane samples even NMR spectra are heavily broadened as 
a result of the slow molecular motions on the time scale of the sampling 
processes. 


Conversely, methods based on nonlinear optical effects make it pos- 
sible to probe both the intra- and inter-molecular structural correlations 
between chromophores, including their time-evolution on a subpicosec- 
ond time scale. As a matter of fact, third order nonlinear experiments 
in the IR spectral region [HLH98] allow probing certain components of 
the molecular susceptibility and, hence, the related structural properties 
on the time scale of the atomic motions. 2D-IR spectroscopy has been 
recently applied to characterise phospholipid membranes [VCZ*07] and 
to perform structural and dynamical analysis of several polypeptides (in- 
cluding those of potential anticancer activity) associated with phospho- 
lipid bilayers [VR09]. Nevertheless, the resolving power of 2D-IR spec- 
troscopies can be limited, in some cases, by persisting ambiguities possi- 
bly due to the coexistence of several molecular conformations [WPH* 02]. 


98 


Francesco Muniz Miranda 


Thus, MD simulations can be proficiently used to explain and elucidate 
the fine features of the 2D-IR spectra [WH00, WPH*02]. 

In a previous paper [VR09], Volkov and Righini used two-colour 2D-IR 
spectroscopy to determine the depth of penetration of the MrG backbone 
into the polar fraction of a phospholipid bilayer made of 1-palmitoyl-2- 
linoleyl phosphatidylcholine (PLPC) molecules. The experiment consisted 
in exciting the water stretching modes and detecting the induced pertur- 
bations in the spectral region of the C=O vibrations. 

In Ref. [VCMMR11] we specifically tried to obtain information about 
the MrG backbone conformation and its localisation in the PLPC bilayer 
with the essential support of (classical) MD simulations and (static) quan- 
tum mechanical calculations. With these combined techniques we had 
a sufficiently exhaustive picture of the structure of the guest molecule 
(MrG) and of its interaction with the neighbouring moieties and of the 
local restructuring of the host bilayer (PLPC). 


5.1 Computational Protocols 


Molecular Dynamics Simulation 


In order to simulate a system composed of ~ 20000 atoms, classical MD 
has to be employed. The simulations have been carried out using the 
ORAC code [MSC* 10], because of the implemented replica exchange method 
(REM) that speed up phase space sampling (see Appendix B.2). In order 
to simulate the behaviour of peptides and PLPC membranes near realistic 
biological conditions, MD simulations are performed in the isothermal- 
isobaric NPT ensemble using an orthorhombic simulation box with PBC. 
Temperature was set at 298 K with a Nosé-Hoover thermostat [Hoo85], 
and pressure at 0.1 MPa, adopting the Parrinello-Rahman Lagrangian 
[PR80] as implemented in ORAC [MP98] with fixed cell angles. Lorentz- 
Berthelot mixing rules (see Section 2.2.5) have been used to model Lennard- 
Jones interactions between different atom types. The AMBER99 [HAO*06] 
force field has been adopted. The phospholipids, dipeptides, and water 
molecules are completely flexible except for the covalent bonds involv- 
ing hydrogen atoms, which are kept rigid to save computations. The 
equations of motion have been integrated using the multiple time-step 
technique [TBM92] and the Ewald method with the smooth particle mesh 
algorithm [EPB*95] is applied to compute the electrostatic interactions. 
The simulated sample consists of 2877 water molecules and of two layers, 
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each made of 63 PLPC molecules and one MrG molecule. 
The sample has been prepared in two steps. 


Figure 5.2: MD Simulation box. Cartesian axes are displayed on the upper-right 
corner. 


In the first one an equilibrated sample of a pure PLPC bilayer has 
been produced, starting from a previously equilibrated configuration ! of 
a hydrated PLPC bilayer made of 128 phospholipid molecules and 2460 
water molecules. The box size of this configuration is a parallelepiped of 
volume 79.666 x 54.478x 55.065 Ä?. 

To avoid possible inconsistencies due to unexpected peptide arrange- 
ment arising from periodic boundary conditions, the box dimension along 
the bilayer normal (hereafter, the z-axis) is increased by 3.5 Ä by adding 
417 water molecules (to get a total of 2877 water molecules). The resulting 
hydrated PLPC bilayer is then equilibrated once again through a standard 
5 nanoseconds MD simulation. Equilibration is verified by monitoring the 
potential energy, the simulation-box volume, and the stress tensor, which 


1The “equilibrated configuration” actually is the final configuration of 
a MD simulation performed by Herrenbauer et al. [HTP95], available at 
http://moose.bio.ucalgary.ca. 
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is found to be almost isotropic. 


Then, Dr. R. Chelli (Chemistry Dept., Univ of Florence) built the initial 
configuration of the MrG-PLPC sample by replacing two PLPC molecules, 
randomly chosen on the two leaflets of the pure PLPC sample, with MrG 
molecules. The replacement of PLPC molecules with MrG molecules does 
not introduce relevant frustration in the system, because of the similar- 
ity between these two types of molecules. System equilibration is then 
achieved with a 2 nanosecond NPT MD simulation. 


With this procedure the PLPC molecules can be assumed equilibrated, 
while the dipeptide molecules may still preserve memory of their initial 
configuration, due to their rugged energy landscape possessing several 
local minima where the peptide can be trapped during the simulation. 


Therefore, in order to accelerate the conformational sampling of the 
dipeptide molecules, also considering the small number of MrG molecules, 
we employed the parallel tempering technique, whose MD version is 
known as “replica exchange method” (REM) [SO99]. The specific ver- 
sion implemented in ORAC and adopted for these simulations is the 
hamiltonian-REM, discussed in Appendix B.2. In these simulations, the 
hamiltonian-REM scaling is applied only to the intra-molecular potential 
energy of both MrG molecules, apart from the harmonic stretching and 
bending potential terms. 16 replicas as been adopted, swapping among 
the following sequence of states: c = 1.000, 0.898, 0.807, 0.725, 0.651, 0.585, 
0.525, 0.472, 0.424, 0.381, 0.342, 0.307, 0.276, 0.248, 0.223, and 0.200. Replica 
exchanges are attempted every 8 femtoseconds. The replica configurations 
characterised by c = 1 have unscaled potential energy and therefore corre- 
spond to configurations sampled at 298 K, i.e. room temperature. Instead, 
the replica configurations characterised by, e.g., c = 0.2, correspond to 
states where the intra-molecular potential energy of the MrG molecules 
(excluding harmonic stretching and bending terms) have a “virtual” tem- 
perature of T/c = 298/0.2 = 1490 K. Since each replica can move through 
the c ensemble, the MrG molecules can easily overcome the internal en- 
ergy barriers with significant improvement of the conformational sam- 
pling. Before performing the production REM run, an equilibration of 
1.68 nanosecond is carried out. Atomic configurations are recorded every 
0.4 picosecond during a production run of 5.4 nanosecond. The statistical 
analysis is done on the c = 1 system configurations. The Ramachandran 
plots I have calculated for c = 0.2 configurations (i.e., at high virtual tem- 
perature) show that the sampling involves a much broader conformational 
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Figure 5.3: Left Panel: Ramachandran plot calculated using MrG configurations 
taken from replicas corresponding to c = 1 (the virtual temperature of the tor- 
sional and inter-molecular potential energies of MrG is 298/c = 298K). Right 
panel: Ramachandran plot calculated using MrG configurations taken from repli- 
cas corresponding to c = 0.2 (the virtual temperature of the torsional and inter- 
molecular potential energies of MrG is 298/c = 1490K). In this case the sampled 
conformational space is much wider. 


space for both MrG molecules, as shown in fig. 5.3. 


180 
100- 
0 
100: 
-180 


-180 180 -100 -S€ 


phi (degrees) phi (degrees) 


180 


psi (degrees) 


I have used the distribution of Ramachandran ® and Y angles? to 
check qualitatively the effectiveness of the MD parameters and of the equi- 
libration protocol. Unfortunately, an entire hamiltonian-REM MD trajec- 
tory had to be discarded due to the evident non-equilibrium behaviour 
of the internal dynamics of the two dipeptide molecules (see Section 5.3.2 
for more details). In fact, the dynamics is very dependent on the choice of 
the partial charges for the electrostatic potential, and the original protocol 
adopted to obtain these latter was unsuitable. The dipeptide was opti- 
mised with the Gaussian09 [FTS*b] program at the B3LYP/6-31++G(d,p) 
level of theory to obtain partial ESP charges, but the possibility of elec- 
tronic rearrangement due to internal torsion was not considered by this 


?The ® angle represents torsions around the N(H)—C(O) bond, while Y represents 
torsions around the C(O)—C, bond. 


102 


Francesco Muniz Miranda 


preliminary and inaccurate approach, and only the “folded” conformation 
reported in fig. 5.4 was found in the trajectory, thus invalidating the first 
hamiltonian-REM MD simulation. At a second attempt, partial charges 


unfolded 


folded 


Figure 5.4: Two possible conformations of the dipeptide with Ramachandran 
torsional angles reported. Green atoms are used to represents the aliphatic chain. 


obtained optimising the two constituent amino acids separately have been 
employed. Therefore, a second run of MD simulations has been carried 
out to achieve a correct equilibration and behaviour in the Ramachandran 
space. 


Ab Initio Calculations 


The quantum mechanical calculations are based on the density functional 
theory, as implemented in the Gaussian09 [FTS*b] program using the 
B3LYP functional and the 6-31++G(d,p) basis set. These calculations are 
due to Dr. V Volkov (then at LENS, now in Mainz, Max Planck Institut für 
Polymerforschung) To make these computations feasible, the myristoyl 
tail of the dipeptide was replaced with a methyl. The molecular con- 
formation of the peptide backbone has been represented in terms of the 
Ramachandran ® and Y angles ranging from (-180°, -180°) to (-100°, 180°), 
with steps of 20° for both angles. The structures of these configurations 
are optimised by keeping the ® and Y angles fixed. For all optimised 
configurations, the coupling constants, the angle between the transition 
dipole moments, and the ratio between the IR intensities of the two amide 
Imodes have been computed. These quantities are then given as functions 
of the ® and Y angles within the Ramachandran space. To be specific, the 
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values of the coupling constants between interacting C=O oscillators are 
obtained from ab initio calculations by reconstructing the Hessian matrix 
for the normal modes in the subspace of the amide I modes [TT98]. The 
method described in Ref. [CHC03], based on the Cartesian displacements 
of the normal modes, has been adopted to retrieve the coupling. 


5.2 Brief Survey of Experimental Methods 


Fig. 5.5 represents the linear IR and nonlinear 2D-IR spectra of the sam- 
ple in the spectral range of the C=O stretching modes, recorded by Dr. 
Volkov. 
The 2D-IR spectra, in parallel and perpendicular polarisation, are recorded 

at 500 femtoseconds pump-probe delay time. The band centred at 1734 
cm! is due to the vibration of the C=O groups of PLPC. The resonances 
of the C-labelled and of the native C=O of MrG are clearly separated 
in frequency (1587 cm! and at 1632 cm™!, respectively). The latter two 
bands are much narrower than the PLPC resonance, whose full width at 
half-maximum (FWHM) is about 40 cm~'!. For a similar phospholipid 
bilayer (dimyristoylphosphatidylcholine) [VCZ*07], the C=O stretching 
bandwidth was reduced by approximately 25% upon removal of the ex- 
citonic coupling by isotopic substitution. The 2D-IR spectrum shows a 
well-defined cross-peak between the two amide I bands of the dipeptide 
(pump centred at 1598 cm!, probe at 1632 cm). In Ref. [VR09], Volkov 
and Righini showed that the MrG molecule is embedded in the polar 
layer of the membrane. The resulting inter-molecular host-guest coupling 
is confirmed by the appearance of cross-peaks between the two amide I 
transitions and the PLPC C=O stretching band (pump at 1598 and 1632 
cm!, probe at 1740 cm~'). The presence of rather intense intra- and 
inter-molecular cross-peaks allows exploring structural properties of MrG 
anchored to the membrane. 


5.3 Findings 


5.3.1 Experimental 


The (almost) Lorentzian lineshapes of the two amide I bands in the FTIR 
spectrum correspond to the essentially vertical alignment of the same two 
resonances in the 2D-IR spectrum of fig. 5.5. No detectable inhomoge- 
neous character shows up in the linear and nonlinear spectra. Actually, 
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Figure 5.5: (A) IR spectrum of MrG in PLPC membrane. (B,C) Nonlinear 2D- 
IR spectra of MrG in PLPC membrane measured in parallel and perpendicular 
pump-probe polarisations, respectively. (D) Zoomed view of the amide I reso- 
nances of panel B. 
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in the analysis of the 2D-IR lineshape, one has to take into account the 
limitations due to the pump bandwidth (17 cm!) and to the actual spec- 
tral resolution of the detection system (5 cm~!). On this basis, it cannot 
be excluded that some small inhomogeneous contribution, on the order 
of very few wavenumbers, is present, resulting in a hardly detectable tilt 
toward the diagonal of the nodal line separating the negative and positive 
lobes of the 2D-IR resonances. An additional element to be considered 
is the relatively long pump pulse used in the experiment. The 17 cm! 
bandwidth corresponds to a pulse duration of about 800 femtoseconds. 
With this low time resolution, a very fast (typically, < 300 femtoseconds) 
loss of memory of the vibrational excitation would lead to smearing out 
any trace of inhomogeneity, even at very short pump-probe delay times. 
In fig. 5.5, panel D, the lines represent the orientation of the nodal lines for 
the two amide I bands. Within the experimental sensitivity, the lines are 
essentially vertical. Under this respect, it is striking the difference with the 
spectral properties of, for instance, trialanine in water [WPH* 02]. In fact, 
in that case, a partial inhomogeneous character is observed, attributed to 
the co-presence of different intra-molecular conformational states of tri- 
alanine. The essentially homogeneous character of the amide I resonances 
of MrG suggests that this dipeptide is present in the phospholipid envi- 
ronment with one specific, largely prevalent conformation. The different 
behaviours of MrG in the membrane and of trialanine in water is at first 
sight contra-intuitive. In fact, the smaller size of the two glycine groups of 
MrG should in principle give rise to a shallower backbone torsional poten- 
tial and, consequently, to an easier access to different conformations. The 
very role of the anisotropic environment provided by phospholipid matrix 
and, possibly, the constraint due to the anchor tail, then come into play 
in “freezing” the conformational mobility of MrG. In the following, MD 
simulations and quantum mechanical calculations provide a substantial 
support to this interpretation. 


5.3.2 Ab Initio Calculations and Molecular Dynamics 


The coupling constant between the amide I modes of the MrG, obtained 
from the relative intensities of the diagonal resonances and the cross- 
peaks, is 5 + 1 cm! [HLDH99]. From the cross-peak anisotropy, fol- 
lowing the method proposed by Hochstrasser et al. [HLDH99], a value of 
43 (or 180-43) + 8° for the angle between the transition dipole moments 
of C-labelled and native amide I modes can be obtained, as reported in 
fig. 5.6. 
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Figure 5.6: Panel A: distribution of the angle between the amide I transition 
dipole moments of MrG obtained from a set of 48 non-linear spectra obtained 
with different pump pulse frequencies and for 4 different samples. Panel B: dis- 
tribution of the coupling constant between the amide I modes of MrG estimated 
from the same experiments. 


The confidence limits for these quantities were derived from a statisti- 
cal analysis of a set of 48 nonlinear spectra obtained with different pump 
pulse frequencies and for four different samples. Then, by comparing the 
experimentally derived data with the calculated distributions in the Ra- 
machandran ® and Y space, the compatible angular regions have been 
extracted. 

The relative IR intensity of the two amide I bands is the additional 
property that has been used to restrict the range of possible backbone 
conformations. According to our ab initio calculated IR spectra, the inten- 
sity ratio of the two transitions changes with the ® and Y angles. In the 
experimental IR spectrum, the intensity ratio of the "°C and !?C amide I 
bands is Ih3/ ha = 1.78. 

Note, however, that the calculations are done for a molecule in vac- 
uum, since for this research ab initio MD trajectories have not been col- 
lected due to the complexity of the system. Considering that the effect of 
the membrane environment on the IR intensities may be not negligible, 
we accept all the conformations that satisfy the condition [3/2 > 1. Fig. 
5.7 gives the calculated ®/Y maps (panels A, B, and C) and the regions 
corresponding to compatible values of the coupling constant, of the an- 
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Figure 5.7: (A-C) Coupling constant, angle between the transition dipole mo- 
ments, and IR intensity ratio between 13C and PC amide I modes of MrG cal- 
culated ab initio as a function of the ® and Y angles. (D-F) The coloured areas 
are the ®/Y regions compatible with the experimental values of the coupling 
constant, the angle between the transition dipole moments, and the IR intensity 


ratio, respectively. 
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gle between transition dipole moments and of the intensity ratio (panels 
D, E, and F). Fig. 5.8, panel A, represents the result of the combination 
of the three constraints on the structural maps. The acceptable region is 
very limited and corresponds to a distribution of structures of essentially 
unfolded character, with ® and Y angles varying in the degree range 
[-100, —180] and [—190, —120], respectively. 
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Figure 5.8: (A) Region of the ®/Y space obtained from the intersection of the 
D, E, and F plots of fig. 5.7. (B) Probability distribution of the ® and Y angles 
obtained from the REM-MD simulation. The two molecular structures on the 
right are representations of the two main backbone conformations (the green 
sphere represents the alkyl chain). 


Fig. 5.8, panel B, shows the conformational distribution of MrG in 
terms of the ® and ¥ angles according to the REM-MD simulation. 

The ®/¥ distribution predicted by the simulation is broader. We have 
singled out two major structural configurations. The first one is essentially 
unfolded, with ® and Y angles around -90 + 20° and -180 + 20°, respec- 
tively. It is compatible with the values of the dihedral angles obtained 
from the experiments (fig. 5.8, panel A). The second structural motif is 
close to a 310/a folded structure, with ® and Y angles around -95 + 20° 
and +10 + 50°, respectively. The unfolded structure has the higher proba- 
bility (64%), compared to the folded one (36%). Representative structures 
of the two main conformers are given in fig. 5.8. 

Dr. Volkov and Prof. Righini already noticed that both linear and 2D- 
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IR spectra in the amide I region are characterised by very low, practically 
undetectable, inhomogeneity. 

In order to verify whether the conformational distribution predicted 
by the REMD simulation is compatible with the experimental results, we 
calculated ab initio the IR spectra of various folded and unfolded confor- 
mations of the dipeptide (using the Gaussian 09 code [FTStb]). More 
precisely, we calculated the IR frequencies and intensities for a grid of 
points in the ®/Y space (separated by 20°) according to the distribution 
of figure 5.8. Thus, each point in the ®/Y grid contributes to the overall 
band profile with a Lorentzian line centred at the calculated frequency, 
with an intensity proportional to the product between the ab initio calcu- 
lated intensity and the probability of the corresponding ®/Y conforma- 
tional state estimated in the REMD simulation. The spectral profiles of 
the folded and unfolded structures are then obtained as superposition of 
the relevant contributions. The calculated frequencies are scaled by 0.93 
in order to fit the experimental spectrum. 
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Figure 5.9: (A) IR spectrum of MrG in PLPC membrane (zoomed view of fig 5.5, 
panel A). (B) IR spectrum of the folded MrG conformations estimated with ab 
initio calculations. (C) IR spectrum of the unfolded MrG conformations estimated 
with ab initio calculations. 


In fig. 5.9, panels B and C, the reconstructed spectra for the folded and 
unfolded conformations of MrG, respectively, are reported. The compar- 
ison of the two calculated spectral profiles shows that, in the case of the 
310/« folded structure, the two components have different intensities and, 
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most importantly, the low frequency band is substantially broader. This 
latter feature is a consequence of the much higher value of the vibrational 
coupling between the two C=O in the folded geometry. This calculated 
spectrum is inconsistent with the experimental profile. 


On the contrary, the spectrum calculated for the unfolded structures 
(fig. 5.9, panel C) agrees well with the experiment. Thus, the REM-MD 
simulation agrees acceptably with the experiments in predicting the un- 
folded backbone conformation as the most probable one. An incorrect 
choice of the electrostatic potential charges forced us to reject entirely the 
results of a previous simulation because it just predicted the presence of 
the folded conformation. 


The presence of about 36% peptide molecules with folded backbone 
conformation appears instead to be overestimated by the simulation. The 
profiles of the linear IR and 2D-IR spectra indicate that the abundance of 
this conformation should be much less. The weak shoulders that appear 
at the bases of the two main bands are compatible with an abundance of 
other conformers on the order of 10-15%. In evaluating this discrepancy, 
one has to consider that the actual conformational distribution is the result 
of the subtle competition of the intra-molecular (basically torsional) poten- 
tial and of the inter-molecular peptide-phospholipid interactions. In our 
simulation we have chosen to adopt the AMBER-99 potential [HAOF06], 
developed and widely employed for peptides. In principle, it would be 
possible to remodel the intra-molecular potential of MrG to account for 
the conformational distribution estimated from the experiments. Anyway, 
the required computational effort would be well beyond the aim of this 
research, as already hinted in Section 2.2.7 about polarisable force fields. 


The calculated spectrum of fig. 5.9, panel C, also provides additional 
arguments for interpreting the unusually narrow line-shapes of the two 
amide I transitions reported in fig. 5.5 (also reported and magnified in fig. 
5.9, panel A). As already noticed, the positive and negative lobes of the 
2D-IR bands are essentially vertical also at very early times, although the 
presence of a very small tilt toward the diagonal direction cannot be com- 
pletely excluded, at least for the high frequency band. The bandwidth is 
about 14 cm!, in agreement with the value measured in the linear spec- 
trum. On the other hand, fig. 5.9, panel C, shows that the calculated IR 
band-shapes, resulting from the frequency distribution of the unfolded 
conformation, are in qualitative agreement with the experiment, suggest- 
ing that the leading term determining the overall band shape is the (rather 
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narrow) distribution of the backbone dihedral angles of the unfolded con- 
formers, with only a minor contribution from the environment. 


The bandwidths of the calculated spectrum of fig. 5.9, panel C, are, 
however, broader (about 25 cm!) than the experimental ones. This dis- 
crepancy suggests that some motional narrowing effect has to be con- 
sidered. With the time resolution of the experiment, very fast (say 300 
femtoseconds, or less) spectral diffusion would make a limited inhomoge- 
neous contribution (about 10 cm!, in this case) practically undetectable. 
In the assumption that only the unfolded conformation of MrG is ap- 
preciably populated, two main processes can provide the required fast 
frequency modulation: the breaking and making of H-bonds between the 
C=O groups and the water molecules present at the membrane interface, 
and the rapid small fluctuations of the dipeptide backbone around its 
equilibrium unfolded structure. 


To check these possibilities, we have relied upon MD data analysis. 


To this aim we have calculated several structural properties, espe- 
cially to get information about the mutual arrangement of MrG and PLPC 
molecules in the membrane scaffold. In figure 5.10, panel A, I report the 
normalised distribution functions of the carbonyl-oxygen atoms and of 
the chain end-carbon atoms of MrG and PLPC along the z-axis. Over- 
all, the observable features are consistent with the formation of a bilayer 
membrane where the hydrocarbon tails of MrG and PLPC are globally 
localised between the hydrophilic moieties represented by the carbonyl 
oxygen atoms. Moreover it is worthwhile to observe that, in spite of 
the significant structural differences of MrG and PLPC molecules, their 
distribution functions are similar in both, position of the maximum and 
broadening. 


Additional insights are obtained from the distributions of the angle 
formed by the polar-head backbones and by the hydrocarbon tails with 
the z- axis (normal to the bilayer). We arbitrarily, but reasonably, defined 
the direction of the polar-head backbone of MrG as the vector linking the 
carbonyl- carbon atom next to the myristoyl tail to the other carbonyl- 
carbon atom. By analogy we defined the direction of the PLPC polar head 
as the vector linking the sn2 carbonyl-carbon atom to the sn1 carbonyl- 
carbon atom. The direction of the hydrocarbon tails is instead defined 
by the vector linking the end-carbon atom to the carbon atom bonded to 
the carbonyl group next to the chain. The normalised angle distribution 
functions are shown in figure 5.10, panel B. The distributions related to 
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the hydrocarbon tails are very similar, although the distribution related to 
the sn2-PLPC tail is slightly broader. 

Two effects probably enhance the propensity of the sn2-PLPC tail to 
form folded structures: its length (17 carbon atoms against 15 and 13 car- 
bon atoms of the sn1-PLPC and MrG tails, respectively) and the presence 
of non-conjugated sp2 carbon atoms which may favor the conformational 
flexibility of the chain. In general we noted that all chains are preferen- 
tially distributed along the z-axis with average angles ranging from 32° 
(sn1-PLPC and MrG tails) to 37° (sn2-PLPC tail). The angle distributions 
of MrG and PLPC polar heads are instead very different, the former be- 
ing globally shifted towards smaller angles. This is somehow expected 
by considering the strong difference in molecular structure. The relevant 
broadening of the MrG angle distribution can be ascribed to the smaller 
size of MrG polar head with respect to PLPC, which may reduce hinder- 
ing effects and hence allow the access to larger configurational space. In 
fig. 5.10, panel C, I reported the distribution functions of the distances 
between the carbon atoms defining the hydrocarbon tails (see above for 
definition). The picture coming out from this quantity is consistent with 
a membrane made of globally unfolded molecules. The shift of the dis- 
tribution functions is clearly due to the difference in chain length. The 
distance distribution related to the sn2-PLPC tail appears the broadest are 
thus showing (as previously remarked) a greater inclination to folding. 


Short MD Simulations and Correlation Times 


In order to characterise the H-bonding dynamics of the C=O groups, 40 
MD simulations of 100 picoseconds each have been performed starting 
from samples extracted from the REMD simulation at 298K, chosen so 
that both dipeptides are in an unfolded conformation. Apart from the 
REM procedure, all other parameters are the same as those employed in 
the REM-MD production run previously described. Atomic coordinates 
have been recorded every 32 femtoseconds. 

The H-bond dynamics between MrG/PLPC C=O groups and water 
has been probed through a simplified “Hydrogen-Bonding Function” fug 
which holds 1 for 0 < t < T and 0 otherwise, where T is the lifetime of the 
H-bond established between a C=O group and a given water molecule. A 
H-bond is considered broken if it is not reformed after a time of 256 fem- 
toseconds. This allows considering two H-bonding events separated by a 
short time gap with no interaction (a time comparable to inter-molecular 
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Figure 5.10: Panel A: normalised distribution functions of the carbonyl-oxygen 
atoms (carbonyl-O) and of the chain end- carbon atoms (End-C) of MrG and 
PLPC along the z-axis. Panel B: normalized distribution functions of the angle 
formed by the polar-head backbones and by the hydrocarbon tails of MrG and 
PLPC with the z-axis. The direction of the polar-head backbone of MrG corre- 
sponds to the vector linking the carbonyl-carbon atom next to the myristoyl tail 
(Ca) to the other carbonyl-carbon atom (Cb). The direction of the PLPC polar 
head corresponds to the vector linking the sn2 carbonyl-carbon atom (Csn2) to 
the sn1 carbonyl-carbon atom (Csn1). The direction of the hydrocarbon tails cor- 
responds to the vector linking the end-carbon atom to the carbon atom bonded 
to the carbonyl group next to the hydrocarbon chain. Panel C: normalised dis- 
tribution functions of the distance between the carbon atoms which define the 
hydrocarbon tails (see description of panel B). 
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vibrational motions) as a unique interaction event. The occurrence of a H- 
bond is established on the basis of a geometrical criterion, namely when 
the distance between the oxygen atom of the MrG/PLPC C=O group is 
smaller than 2.4 Ä (this value corresponds approximately to the first min- 
imum of the O. - - H pair distribution function). The function fpg has been 
calculated for all C=O-water couples of all 40 MD simulations to achieve 
a good statistical significance. In order to get the H-bond lifetime, the 
normalised autocorrelation functions 


(fas (t) fess (0)) 
(fein) 


of all H-bonds have been averaged, obtaining the curves reported in fig. 
5.11. The integral correlation times corresponding to the 3 types of H- 


H-bond involving water and CO close to MrG tail 
H-bond involving water and CO far from MrG tail 
H-bond involving water and CO of PLPC 


0 1 2 3 4 5 6 
Time (ps) 


Figure 5.11: H-bond correlation functions for 3 types of H-bond involving the 
CO groups (of MrG or PLPC) and water (see legend). 


bonds are 1.3 picoseconds (H-bond involving water and C=O of PLPC), 
0.7 picoseconds (H-bond involving water and C=O far from the MrG tail) 
and 2.3 picoseconds (H-bond involving water and C=O close to the MrG 
tail). 

Therefore, it seems unlikely that the hydrogen bond dynamics can be 
responsible for the rapid frequency fluctuations suggested by the exper- 
imental results. It seems more reasonable to attribute such fast modula- 
tion to small amplitude fluctuations of the MrG backbone. This process 
was found to contribute to the narrowing of the IR lineshape of trialanine 
amide I transition in water [WPH*02]. 
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Dipeptide Penetration into the Membrane 


The partially excitonic character of the delocalised C=O vibrations at the 
interface of the phospholipid membrane is a well known feature [VCZ +07]. 
In particular, as the dipeptide is present as an impurity in the membrane, 
and since the amide I modes are spectrally separated from the frequen- 
cies of the PLPC carbonyl modes, it can be securely stated that the spectral 
features of the cross-peaks between the amide I transitions and the PLPC 
C=O band can be interpreted in terms of inter-molecular pairwise inter- 
actions. In other words, the cross-peaks measured at the PLPC carbonyl 
stretching frequency upon excitation of the amide I modes provide infor- 
mation on the average angle between the transition dipole moments of 
MrG and of the neighboring PLPC molecules. From the anisotropy mea- 
surement that Dr. Volkov performed for the current research, the values 
of 21 + 14° and 33 + 6° for the ’C and !?C moieties have been obtained, 
respectively (or equivalently 159° and 147°). To rationalise these values, 
I calculated the weighted radial-angular distribution function related to 
MrG and PLPC carbonyl pairs, which are shown in fig. 5.12. For both 
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Figure 5.12: Weighted pair radial-angular distribution functions of the PLPC and 
MrG carbonyl groups. (A) Function related to PLPC C=O and MrG BCO pairs. 
(B) Function related to PLPC C=O and MrG !? C=O pairs. The vertical dashed 


lines represent the values of the angles derived from the 2D-IR experiment; the 
corresponding errors are indicated by the horizontal bars. 
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MrG amide groups, the function is sharply peaked at the distance of 5 
Ä. In the case of the carbonyl next to the anchor tail (fig. 5.12, panel A) 
also the angular distribution is quite narrow (between 20° and 50°). For 
the distal C=O instead, the angular distribution is definitely broader. The 
vertical dashed lines in the same figure correspond to the angles derived 
from the experiment. The calculated values are in general compatible with 
those obtained from the 2D-IR spectra, although the angular distribution 
calculated for the distal carbonyl is centred at a larger angle. In any case, 
it should be noted that the experimental error affecting the angular value 
of the distal carbonyl is much larger than that for the vicinal one (see error 
bars in fig. 5.12). 


The depth of penetration of MrG into the PLPC membrane was the 
subject of a previous experimental work of Volkov and Righini [VR09]. 
That experiment consisted in exciting the water stretching modes and 
measuring the perturbation induced in the amide I transitions of MrG. 
The meter stick used for determining the vertical localisation of MrG is 
provided by the density profile of the water molecules across the mem- 
brane, which is known from neutron diffraction measurements [JW89]. 
In fact, the appearance of cross-peaks in the two-color 2D-IR spectra is a 
consequence of the hydrogen bonding between the C=O groups and the 
surrounding water molecules. 


The intensity ratio between the cross-peaks involving the amide I reso- 
nances of MrG and that corresponding to the PLPC C=O (whose localisa- 
tion in the bilayer thickness is known from neutron scattering data [JW89]) 
is taken as a measure of the relative degree of hydration of the MrG car- 
bonyls with respect to those of PLPC. From the known vertical profile of 
the water density, one can then derive the localisation of the C=O groups 
of MrG. The results obtained following this procedure are summarised in 
fig. 5.13. The two C=O groups of the dipeptide are found to lay, in aver- 
age, above those of PLPC molecules: more precisely, the vertical distances 
from the average position of PLPC carbonyls was 2 A for the C=O next to 
the MrG tail, and 4.4 A for the distal C=O. 


Clearly, only the hydrated C=O can be detected by this method. The 
analysis of the REMD simulation described in the present paper provides 
a reliable test of this conclusion. In fig. 5.13, panel B, the blue and red 
lines are obtained by multiplying the calculated distribution of the oxygen 
atoms of MrG by the density of water oxygen atoms as obtained from 
the REMD simulation. The black curve is the distribution of all C=O 
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oxygens of the PLPC molecules. In agreement with the results of our two- 
color pump-probe experiment, the REMD simulation predicts for the two 
amide groups an average localisation above that of the PLPC carbonyls. 
The distances are slightly smaller (1.9 and 3.1 Ä) than those deduced from 
the experiment. The method adopted in Ref. [VR09] and used here to 
calculate the curves in fig. 5.13, panel B, is based on the assumption that 
the water density depends only on the value of the z-coordinate across 
the membrane thickness. Still, from the simulation data we can exactly 
calculate the degree of hydration of the oxygen atoms of MrG and of the 
PLPC carbonyls in the sample. 


Fig. 5.13, panel C, reports the distribution of the MrG oxygens hydro- 
gen bonded to water molecules. Their average position is still above that 
of the PLPC carbonyls, but their distance from that reference altitude is 
even smaller: 0.8 and 2.3 A. The obvious conclusion is that the assumption 
of the water density depending only on the z coordinate is not completely 
justified. 

In order to verify this point, I have calculated the ratio between the 
water densities around MrG and PLPC molecules. In particular, I have 
consider the water density in xy planes parallel to the membrane surface, 
as a function of the altitude in the perpendicular (z) direction. The origin 
of the z axis is placed at the average altitude of the carbonyl oxygens; the 
abscissa represents the distance in the xy plane from the carbonyl oxygens 
of MrG/PLPC. The ratio of the two densities is given in fig. 5.14, panel A. 
The figure shows that there is definitely more water near the MrG oxygens 
than near the PLPC carbonyl oxygens. 


The reason is made clearer from fig. 5.14, panel B: this plot is calcu- 
lated with the same procedure of fig. 5.14, panel A, considering all atoms 
but excluding those belonging to water molecules. It is apparent that the 
two plots of fig. 5.14 are somehow complementary. The increased density 
of water molecules corresponds to areas of the membrane (those above 
the guest MrG molecules) where the density of the polar groups of PLPC 
(namely phosphate and choline) is lower. The MrG molecule entering the 
membrane polar surface creates the pathway for a deeper penetration of 
water inside the membrane. This results in a different hydration of the 
C=O groups, the amide groups of MrG being more hydrated than the 
C=O of PLPC located at the same altitude. This is confirmed by the mean 
hydrogen bond number calculated for the MrG and PLPC carbonyls: the 
values are 0.87 for the former and 0.54 for the latter. If this picture corre- 
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Figure 5.13: (A) Atomic densities of water (green line) and PLPC carbonyl (black 
line) from neutron scattering data. The blue and red dotted lines are the alti- 
tudes of the °C and !?C carbonyls of MrG, as deduced from the experiment of 
Ref [VR09]. (B) Distribution of all PLPC carbonyl oxygen atoms obtained from 
the REMD simulation (black line); distribution of the water oxygen atoms from 
the REMD simulation (green line); blue line shows the function calculated from 
REMD simulation by multiplying the distribution of the oxygen atoms of the 13C 
labeled carbonyl of MrG with the distribution of water oxygen atoms; red line 
shows the same for the oxygen MrG native carbonyls. (C) Distribution of all 
PLPC carbonyl oxygen atoms obtained from the REMD simulation (black line); 
the blue and red curves represent the calculated distributions of the oxygens of 
the MRG carbonyls H-bonded to water molecules. The arrows indicate the av- 
erage altitudes of the two distributions. For graphical reasons, in panels B and 
C the vertical axes are up-shifted to make the maxima of the experimental and 
calculated densities of the PLPC carbonyls coincident. 
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Figure 5.14: (A) Ratio between the densities of water oxygen atoms around MrG 
and PLPC carbonyl oxygens, respectively, as a function of the depth into the 
membrane of the MrG/PLPC carbonyl oxygen (distance along z axis) and of the 
distance in the membrane plane between water oxygen and MrG/PLPC carbonyl 
oxygen (distance in the xy plane). (B) Same ratio as in panel A, but considering 
all nonwater atoms. 


sponds to the structure of the real membrane, one should expect that the 
method adopted in Ref. [VR09] overestimates the average distance of the 
dipeptide from the centre of the membrane. 


The structure and conformation of the anchor tail of MrG, in relation 
to that of PLPC are not directly observable in Dr. Volkov’s experiments. 
It is, however, of some interest to analyse, under this respect, the sam- 
ple obtained from the REM-MD simulation, which basically tell us what 
the experiments keep for themselves. To this purpose, I consider several 
structural properties, concerning both the anchor chain conformation and 
the mutual arrangement of the aliphatic tails of MrG and PLPC molecules. 


Fig. 5.15, panel A, shows the distribution functions of the carbonyl- 
oxygen atoms and of the chain end-carbon atoms of MrG and PLPC along 
the z-axis of the membrane during the simulation. It is noteworthy that 
the terminal carbon atom of the MrG anchor is centred near z = 0 and is 
very similar to that of the corresponding carbon atoms of the PLPC hy- 
drocarbon tails. This implies that, on average, the anchor chain of MrG is 
parallel to the direction of the PLPC tails. This is confirmed by the plot 
of fig. 5.15, panel B, where the distribution functions of the distances be- 
tween the first (next to the carbonyl group) and the last carbon atoms of 
the hydrocarbon tails of MrG and PLPC molecules (end-to-end distance) 
is shown. This is a simple criterion for quantifying the degree of folding 


120 


Francesco Muniz Miranda 


0.4 


“ A — MG B 
2 0.12 End-C PLEC 0.3 ons 
S Carbonyl-O — rG tai 
300 © 7 i ‘| al —  snl-PLPC tail 
5 0.2 F — sn2-PLPC tail 
3 0.06 N 
E 
& 0.1 
3.0.03 
0 \ — 0 ae. 
-30 20° -10 0 10 20 30 0 4 8 20 


zIÄ C-C Ma / N 


Figure 5.15: (A) Normalised distribution functions of the carbonyl-oxygen atoms 
(carbonyl-O) and of the chain end-carbon atoms (End-C) of MrG and PLPC along 
the z-axis of the membrane. (B) Normalised distribution functions of the distance 
between the first (next to the carbonyl group) and the last carbon atoms of the 
MrG, sn1-PLPC and sn2-PLPC hydrocarbon tails (see fig. 5.1 for sn labelling 
notation). 


of the long aliphatic chains. The figure shows that only the linoleic (sn2) 
chain of PLPC, whose distribution function shows a non-negligible wing 
reaching values lower than 4 Å, is present in folded conformations. In 
fact, it is known that unsaturated hydrocarbon chains of lipidic bilayers 
can be found in folded geometries. On the contrary, the myristoyl chain of 
MrG and the palmitic (sn1) tail of PLPC have almost identical length dis- 
tributions (apart from the obvious difference due to the different number 
of constituent carbon atoms), corresponding to prevalent unfolded struc- 
tures. 


Thus, as it is apparent from this research, understanding the struc- 
tural properties of the association between a small peptide molecule and 
a phospholipid membrane is a complex task, which requires the coordi- 
nated use of a number of experimental and theoretical tools. 

The results of the classical MD simulations are in acceptable agree- 
ment with the structural properties deduced from the spectroscopic in- 
vestigation, thus substantiating and reinforcing the interpretation of the 
experimental data. Anyway, a lot of care should be given to correctly as- 
semble the force filed, since even minor inadvertence results in completely 
wrong trajectories, with no real physical meaning. The behaviour of the 
dipeptide, its depth of penetration into the membrane, its affinity for wa- 
ter and PLPC, all this is driven by H-bond breaking and formation. In fact, 
the autocorrelation function of a rather crude “H-bond function” allows 
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us to weight the relative stability of the different types of inter-molecular 
interactions. 

The rather successful application of the diversified and coordinated 
investigation methods adopted here, including nonlinear IR spectroscopic 
techniques and molecular modelling, represents an indication that similar 
approaches can be used to tackle structural problems of higher complex- 
ity, such as those involving polypeptides of medical and pharmacological 
interest. 

However, to obtain more detailed insights about the behaviour of molecules, 
ab initio MD proves a much more suitable probe, albeit at the price of a 
much more reduced molecular complexity. In particular, it is necessary 
to study molecular vibrations. The REM approach adopted for the dipep- 
tide, for example, does not even generate a truly time-dependent trajec- 
tory (see Appendix B.2) to obtain a vibrational spectrum via Fourier or 
wavelet transforms. 
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6. Simple Diols in Water and Acetonitrile 


Two simple diols in aqueous and organic solvent have been simulated 
at ab initio MD level of theory to be exploited as test case for the direct- 
space WT code (see Section 4.2.3), which I have developed to perform 
time-frequency analysis. 

In fact, WT use in theoretical and computational chemistry is gradually 
increasing. In particular, it has been applied to electronic structure calcula- 
tions [GI99, NG06, DAE03, CAJL93] and to MD [MSC08a, MSC08b, ACR96] 
also to obtain time-dependent vibrational frequencies [RW05]. 

In this research, WT has been mainly used to correlate vibrational fre- 
quencies of two molecular systems with other time-dependent structural 
properties. WT has been employed to improve the analysis of H-bond dy- 
namics and its relation with spectroscopic properties. Since in MD simula- 
tions it is always possible to associate to each simulation step a structural 
quantity, distance-frequency correlation showing how the VDoS changes 
with a particular H-bond length has been obtained. 

The two glycols simulated in aqueous environment are propane-1,3- 
diol 
(propanediol, PDO) and ethane-1,2-diol (ethylene glycol, EG), two homol- 
ogous compounds of industrial interest that interact with the aqueous 
solvent mainly by inter-molecular H-bonds. PDO is a transparent, non- 
toxic liquid glycol that can be obtained by fermentation of sugars and can 
replace other glycols in formulations where petroleum-free [OHS08] in- 
gredients are needed, which is relevant if the current hysteria about oil 
consumption, CO, emissions and so-called “global warming” is going to 
last. EG is the lower homologue of PDO and is mainly used as an an- 
tifreeze and a medium for convective heat transfer due to its low freezing 
point. 

For both PDO and EG, a deep knowledge of the H-bond interactions 
and of the solvation dynamics is needed to improve their use in industrial 
applications. These glycols have two interaction sites with the water sol- 
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vent and the analysis of the H-bond features are challenging due to the 
overlap of the solvent-solvent and solute-solvent contributions. 

For this reason, PDO has also been simulated in acetonitrile (ACN) 
solvent. In fact, a non-protic solvent like ACN allows measuring the OH 
stretching bands of PDO, thus providing an experimental reference. More- 
over, the behaviour of PDO in ACN is significantly different from that in 
aqueous solvent and requires a more complex approach to be properly 
elucidated, due to the occurrence of intra-molecular H-bonds even be- 
tween its two hydroxyl groups and not only with the solvent. 

These two hydroxyl sites are often referred separately in the follow- 
ing, typically as “site # 1” and “site # 2”. While this choice has no great 
physical meaning for the aqueous simulations, it is very important for the 
simulation in ACN due to the different non-ergodic behaviour of the two 
sites in this latter case. 


6.1 Propanediol and Ethylene Glycol in Water 


6.1.1 Computational Chemistry Details 


Figure 6.1: Simulation box of PDO in water. 


Ab initio MD simulations have been performed for PDO and EG in 
water. These simulations have been carried out with the CPMD pack- 
age [CPM] in the microcanonical ensemble (NVE) in conjunction with the 
BLYP [Bec88, LYP88] exchange and correlation functional, in cubic boxes 
of edge 12.7005 A (for PDO) and 12.6819 A (for EG), with periodic bound- 
ary conditions. The samples in the two simulation boxes are made of 
1 solute molecule and 64 D2O molecules (at the experimental density of 
deuterated water, ~ 1.106 g-cm~3). To give a quick idea of the size of this 
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system, also in comparison to that of the dipeptides in PLPC (see fig. 5.2), 
I report fig. 6.1 that displays the simulation box with PDO and solvent. 
Norm-conserving Martins-Troullier [TM91] pseudopotentials have been 
used along with Kleinman-Bylander [KB82] decomposition, and the plane 
waves expansions have been truncated at 85 Ry (this choice has been 
shown to be particularly effective in CPMD simulations [KMM*04)). 


ethane-1,2-diol ? Ref. [HJK05] BLYP ? | B3LYP ? | MP2" | CPMD © 


r (C-C) 1.514 1.527 1.517 1.513 1.523 

r (C—01) 1.433 1.454 1.434 1.431 1.457 

r (C—Od) 1.421 1.440 1.421 1.418 1.444 

r (O01-H) 0.961 0.971 0.961 0.961 0.974 

r (O2-H) 0.964 0.975 0.965 0.964 0.977 

r (Oi: --H) 2.331 2.434 2.399 2.323 2.418 

8 (O1—C-C) 106.1 106.7 106.8 106.2 106.8 

9 (O2—C-C) 111.2 112.2 112.0 111.1 112.2 

o (H—-01-C-C) -166.0 -164.8 | -166.7 | -164.0 | -166.6 

o (H—02—-C-C) -51.5 -54.9 -53.6 -51.9 -52.1 
propane-1,3-diol “ | Ref. [KMMR82] | BLYP > | B3LYP ? | MP2? | CPMD* 

r (C3—Ca) 1.514 1.533 1.523 1.518 1.528 

r (C4—Cs) 1.514 1.543 1.532 1.527 1:539 

r (C3-O}) 1.410 1.461 1.439 1.435 1.465 

r (C5-O;) 1.410 1.441 1.422 1.420 1.445 

r (O1-H) 1.040 0.972 0.962 0.962 0.975 

r (O2—H) 0.980 0.977 0.967 0.965 0,979 

r (O1. H) 1.753 2.051 2.034 2.000 2.052 

0 (O1—C3—C4) 112.0 108.4 108.6 108.0 108.9 

9 (O2—C5—C4) 108.0 113.4 113.4 113.0 114.0 

0 (Ca-C4-C;) 112.0 1139 113.8 112.9 114.1 

@ (H=O;—C3—Cy) 180.0 173.9 175.6 173.2 172.3 

p (H—02—C5—C4) -46.0 -46.3 -46.7 -48.4 -41.4 


Table 6.1: Structural parameters. * Bond lengths in A; angles in degrees. b DFT 
and MP2 calculations have been performed with the GAMESS [SBB*93] suite 
of programs in conjunction with 6-311++G(d,p) basis set. 
optimisations have been performed with the same parameters reported for the 


MD simulations. 


€ CPMD geometry 


In order to use a larger time-step (df of 5 a.u. ~ 0.12 femtoseconds), 
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hydrogen atoms have been replaced by deuterium atoms. 

The fictitious electronic mass has been set at 700 a.u. to allow for a 
good decoupling between electronic and nuclear degrees of freedom. The 
systems has been thermalised by scaling the atomic velocities for 2 pi- 
coseconds in order to keep the temperature at the value of 300 + 50 K and 
the trajectories for both systems have been collected for ~ 32 picoseconds. 

The data collected from the two simulations of the glycols have been 
compared with corresponding CPMD simulations in vacuum. The com- 
putational parameters used in vacuum simulations, such as cell param- 
eters, energy cut-off and time-step, are the same as those in water sim- 
ulations. To further check the accuracy in reproducing the structural 
parameters of isolated diols with the computational approach adopted, 
geometry optimisations have been carried out with the GAMESS pack- 
age [SBB*93] at the MP2 and density functional theory (DFT) level of 
theory (using the BLYP [Bec88, LYP88] and B3LYP [Bec93b, Bec93a, LYP88] 
exchange-correlation functionals) with 6-311++G(d,p) basis set. 

The comparison of experimental and calculated data is reported in 
table 6.1, showing only minor differences between all electrons and pseu- 
dopotentials calculations. All calculated normal modes frequencies are 
real and positive, ensuring that the systems are in their respective equilib- 
rium configurations. 


6.1.2 Structural and electronic properties 
Structural properties 


PDO and EG interact with water mainly through H-bonds, acting as both 
acceptor and donor. The H-bond can be characterised structurally through 
the O—D- --O distance r and the O—D---O angle 0. Figures 6.2 and 6.3 
report the pair radial g(r) and angular g(0) distribution functions for the 
acceptor and donor H-bonds, respectively. It can be seen that radial dis- 
tribution functions of the two glycols differ very slightly. The deep first 
minima are indicative of the slow mobility of the water molecules bound 
to the glycols. On the contrary, it can be noted that the g(@) functions 
for PDO are narrower, probably due to the fact that the two OD sites are 
farther away than in EG, allowing a more independent solvation dynam- 
ics. This also implies that the angular parameters can be more sensitive 
to structural details. In the present case, the narrower ¢(@) function is 
an evidence for a more stable H-bond in the PDO solution. The av- 
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Figure 6.2: Acceptor g(r) and g(0): red for “site 1”, blue for “site 2”; panel (a) 
and (b) for EG, panel (c) and (d) for PDO. 


erage coordination number for the OD sites, extracted from the g(r), is 
reported in table 6.2, and is always close to the expected value of 1 for the 
donor interactions, whereas the acceptor coordination number is larger 
and shows a significant spread as a consequence of weaker interactions. 
The differences occurring for site 1 and 2 of both diols arise from statis- 
tical uncertainty due to the finite temporal length of the simulations and 
the use of a single solute molecule for each system. 


site TY ste 2] 


EG donor 


EG acceptor 
PDO donor 
PDO acceptor 


Table 6.2: Average coordination numbers. 


To better characterise the strength and structural features of the H- 
bond, the weighted joint radial and angular distribution function g(r, 0) 
can be calculated by adopting the weighting function FB [PCRS03,PCS05, 
FPCS06] already discussed in Section 3.1.3 (eqn. 3.8). 
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Figure 6.3: Donor g(r) and g(0): red for “site 1”, blue for “site 2”; panel (a) and 
(b) for EG, panel (c) and (d) for PDO. 


The results shown in fig. 6.4 confirm that H-bonding in PDO is stronger, 
due to the lower spread of the joint distribution function along the angular 
parameters, since in the present simulations the angular conditions are a 
more stringent requirement for the hydrogen bond formation. A pictorial 
view related to the weighted g(r, 0) is shown in fig. 6.5, where the spatial 
distribution functions (SDF) of the OD groups of the water molecules are 
reported, displaying a three-dimensional description of the motion for the 
water molecules around the OD groups of the diols. The cyan-coloured 
isosurfaces (each point has to be visited 55 times) close to the glycols rep- 
resent the probability to find the D atoms of water around the O-sites of 
the solutes, whereas the orange-coloured ones represent the probability to 
find the O atoms of water around the alcoholic D atoms of the diols. These 
combined isosurfaces give a clear representation of the global mobility of 
the solvation cage. The spread of the isosurfaces for PDO is smaller than 
for EG, in line with the previous observations from the g(@) and g(r,0) 
functions. 
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Figure 6.4: Weighted acceptor g(r, 0) for the two OD sites of the two diols. 


Electronic structure analysis 


The polarisation effects due to the interaction of the glycols with the sol- 
vent have been analysed in MLWF centres [MV97,SMVP98], briefly dis- 
cussed in Section 3.2. The positions of the MLWF centres can be related 
to the electron pairs and give a direct picture of the electronic structure. 
In order to obtain useful insight on the electronic structure changes due 
to the interactions of the glycols with the solvent, the molecular dipole 
moment computed on the basis of the MLWF centres positions of PDO 
and EG has been monitored during the simulations. For this purpose 
the dipole moments of PDO and EG have been calculated on a series of 
time-equispaced configurations obtained by CPMD simulations (1 config- 
uration every 10* time-step, corresponding to 1.2 ps). The polarisation 
enhancement of glycols in solution has been obtained by adopting the fol- 
lowing computational strategy. For each configuration the MLWF centres 
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(a) EG, site 1 (b) EG, site 2 


(c) PDO, site 1 (d) PDO, site 2 


Figure 6.5: SDF of PDO and EG. Cyan and orange surfaces show the mobility of 
D and O atoms of H-bonded water molecules, respectively. 


have been obtained for all the system, for the diols and the H-bonded 
water molecules and for diols alone without geometry optimisation. 


In table 6.3 the dipole moments calculated by MLWF centres analysis 
by localising the valence shell electron doublets are reported. The dipole 
moment of the diols increases as reported in table 6.3, as a consequence 
of the interaction with the solvent. The time dependent data also show 
an evident correlation in the time evolution of the dipole moments for 
the three model systems. However, the solvent increases the dipole mo- 
ment of the diols and larger fluctuations are observed for EG, which being 
smaller in size is more sensitive to polarisation effects by the water cage. 
Furthermore, the dipole moment increase, due to solvent interaction, is 
larger in EG than in PDO for the same reason. 
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| isolated diol | diol + H-bonded water molecules 


EG 2.71 D 3.11 D 4.55 D 
PDO 2.05 D 2.18 D 3.87 D 


Table 6.3: average dipole moments for diols according to MLWF centres analysis 
on the three model systems. 


6.1.3 Time-frequency analysis 


To analyse PDO and EG vibrational response in aqueous environment, I 
have employed a combination of FI and WT methods. I have used the 
direct space implementation of WT (see Section 4.2.3), due to the fact that 
it was the first WT code I wrote. 

In Ref. 6.6 the VDoS of the water solutions, of the diols with the two 
closest molecules and of the diols by themselves in the water solutions are 
shown, obtained by FFT of the velocity autocorrelation functions. The OD 
stretching region (2100-2600 cm~'), which is the most interesting region 
of the spectrum, will be discussed in the following on the basis of WT 


analysis. 
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Figure 6.6: Vibrational densities of states: (i) VDoS of the complete simulation 
cells; (ii) VDOS of the diols and the two H-bonded water molecules; (iii) VDOS 
of the diols by themselves. 


The structure of the OD stretching band results from the convolution 
of solvent-solvent and solute-solvent interactions and I focus on the struc- 
ture, extracted from the simulation, of the VDoS of the solute molecule 
and of the two nearest water molecules bound to the diols. This can be 
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appreciated in more details analysing the VDoS extracted by FFT of the 
oscillations of the intra-molecular Aro_p functions, as reported in fig. 6.7, 
where the density of states obtained by simulations of the diols in solu- 
tion (panel a and b) and in vacuum (panel c and b) are reported. The 
redshift due to H-bonding interactions with the water solvent is evident, 
along with the broadening of the stretching band. In fig. 6.8, I report 
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Figure 6.7: VDOS for hydroxyl group stretching mode in water (panels a and b) 
and in vacuum (panels c and d) simulations. 


again fig. 4.8. While in Section 4.2.2 this figure was exploited to show 
how spectrum reconstruction changes with values of 7 parameter of the 
MG mother wavelet, without knowledge of the meaning of the input sig- 
nal, here I want to underline that the VDoS of a diol can be reproduced by 
both FFT and WT, using the inter-molecular Aro...p(t) fluctuation as input 
function. The inhomogeneous broadening of the band can be attributed 
to time changes of the structure of the solvent cage around the hydroxyl 
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Figure 6.8: Fourier and wavelet power spectrum of the displacement of inter- 
molecular Aro...p(t) function of EG, site 2. 


groups of the diols, which modulates the instantaneous vibrational fre- 
quency. 

To better enlighten the origin of the various frequency components, 
the trajectory of site 1 of PDO has been analysed by WT at specific time 
intervals. Fig. 6.9 shows the details of the trajectory referring to site 1 of 
PDO. It can be seen that along the first 5 picoseconds of simulation site 
1 is alternatively free or bound to water molecule # 22. Later, along the 
trajectory, the site is constantly bound to the same water molecule. The 
WT spectrum has been calculated at the simulation times of 1.2, 2.2, 3.8 
and 4.8 picoseconds (dotted red lines in fig. 6.9) and the results, reported 
in fig. 6.10, clearly shows that the redshift nicely correlates with the H- 
bonding character. The same correlation has been observed at all time- 
steps probed for both sites of the two glycols. 

An alternative and more immediate way of illustrating this behaviour 
is obtained by directly correlating the most intense O—D stretching fre- 
quency, obtained by WT of the corresponding Aro_p(t) function, with 
the values of the FHP donor function. This is displayed in fig. 6.11, 
showing how the wavelet-calculated VDOS for all OD sites of the diols 
changes with the value of the F function (which is confined in the 
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Figure 6.9: FHB evolution with time for interaction of DO molecule # 22 with site 
1 of PDO (full blue line). In dotted red lines the sampled time-steps for which 
the wavelet spectra are reported in fig. 6.10. The red circles represent the FHB 
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Figure 6.10: Wavelet spectra at four different time-steps calculated from intra- 
molecular Aro_p(t) function for site 1 of PDO (parameter 7 = 10). 
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[0 — 1] interval due to the fact that only a single donor H-bond can be 
present at a time). The plotted quantities are the probability distributions 
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Figure 6.11: Correlation spectrograms between O — D stretching frequency and 
FHB function. 


of the VDOS, which thicken and are redshifted when F}? approaches 1, 
meaning that the inter-molecular H-bond lowers the O—D intra-molecular 
stretching frequency. Consistently, the VDOS is peaked at higher frequen- 
cies when FHB ~ 0. As it can be seen, there is a sort of pathway between 
the two extremes in the frequency/configurational space explored by the 
simulations. For sites 2 of both glycols the pathway is continuous, due to 
a more oscillating character of the FB function with too fast oscillations 
to be precisely resolved even with WT. 

As a whole, the present analysis shows how a structural quantity like 
FHB designed as a probe of the H-bonding, can be efficiently correlated 
with the stretching frequency of an intra-molecular vibration, because the 
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molecular oscillators are coupled to the water environment and are sensi- 
tive to small fluctuations in the solvation cage. Moreover, the advantage of 
a continuous function like F#® to monitor on the fly the dynamics of the 
hydrogen bond and its effect on the vibrational spectrum is quite evident. 

Wavelet analysis has also been used to sort out the time evolution 
of the frequency related to the most intense peak in the high resolution 
spectra (obtained like those of fig. 6.10) and its correlation with the inter- 
molecular ©... D bond length between nearest molecules along the simu- 
lation. The result is shown in fig. 6.12. The blue line in fig. 6.12, drawn as 
a guide for the eye, has been obtained by a 10'"-order polynomial fit of the 
raw data. A straight correlation between the time evolution of vibrational 


a) EG, site 1 b) EG, site 2 


EY, yt yt y NM 1 oop E e 1] 
0 10 20 30 10 20 30 


time (ps) time ( ps ) 
c) PDO, site 1 d) PDO, site 2 


C 1 1 1 L n 4 
0 10 20 30 
time (ps) time ( ps ) 


Figure 6.12: Comparison of the time evolutions of H-bond length and O—D 
stretching frequency. The brown lines correspond to the actual values found 
during the simulations, whereas the blue lines represent the smoothed trends of 
the former. 


frequencies and the time evolution of intra-molecular O—D stretching dis- 
tance can be appreciated. 
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A more straightforward way to show the correlation between stretch- 
ing frequencies and inter-molecular Dgjoi- * * Owater distance is via the two- 
dimensional spectrogram plot of fig. 6.13, where distance and frequency 
are directly correlated at each time-step. The probability distribution re- 
ported in fig. 6.13 represents the maximum of the intra-molecular O-D 
band stretching obtained by WT correlated with the corresponding value 
of the inter-molecular H-bond length. These spectrograms show the same 
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Figure 6.13: H-bond length — frequency spectrograms for both PDO and EG. 


“banana shaped” distributions (as referred to in References [MSC08b,MSC08a]), 
obtained for other systems like heavy water and Cl” anions in heavy wa- 
ter. Moreover, the same type of shape is obtained using inter-molecular 
H-bond distances or intra-molecular O—D bond lengths, since they both 
mainly probe the stretching frequency associated to the O—D stretching 
of the glycols. The VDOS thickens at bond lengths corresponding to the 
first peak of the g(r) and g(r,0) functions. The frequency shift is thus 
simply correlated to the strength and stability of the H-bond. A strong 
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H-bond gives a g(r) function peaked at shorter distances and, within the 
harmonic approximation, it reduces the intra-molecular stretching force 
constant and, consequently, shifts the VDOS at smaller wavenumbers. The 
slightly different plot for EG, site 2, is due to the fact that the most strongly 
interacting water molecule for that site is not H-bonded for the first pi- 
coseconds of simulation, thus resulting in a final spectrogram showing a 
weaker H-bond, as expected. 

It is not possible to establish an exact bijective relationship between 
bond length and frequency, but this is not surprising because the radial 
distance is only one of the structural parameters that characterise the H- 
bond, the other being the angular parameter. Hence, the distance is de- 
generate in the angle space. Moreover, the Heisenberg-like uncertainty 
principle between time and frequency, and therefore between distance and 
frequency, affects the spectrogram resolution (see Section 4.2). However, 
these plots clearly show that a reliable correlation between bond length 
and density of vibrational states exists, which can be understood on the 
basis of the H-bond stability. This is in agreement with previous results 
obtained using static ab initio calculations [Kle02]. 

Thus, it has been found that the complex pattern of the hydrogen bond 
stretching mode has an inhomogeneous origin and arises from the convo- 
lution of a number of differently shifted vibrational modes that can be 
sorted out by wavelet transform at different time intervals of the MD sim- 
ulation. The time resolved vibrational modes obtained by WT can be 
correlated in a straightforward way with the structure of the solvent cage 
around the solute diols and with structural parameters like the O—D bond 
length or with the strength of the hydrogen bonding. Therefore, WT is not 
only a suitable tool to analyse the vibrational features in term of modu- 
lation of the local structure of the solvation cage but also offers unique 
opportunities to study time-resolved spectroscopic experiments. 
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6.2 Propanediol in Acetonitrile 


6.2.1 Experimental Setup 


The absorption spectra of PDO in ACN (1:60 volume ratio) at room tem- 
perature were recorded in the 3300-300 nanometers region with a spec- 
trophotometer equipped with near-IR and visible light sources. Infrared 
spectra of PDO in ACN and as a pure liquid were recorded in the 2700—3700 
cm“! region. Raman spectra of ACN in various solvents and at vari- 
ous concentrations at room temperature were recorded using the 514.5 
nanometers line of an Ar* laser. The overtone spectra of PDO in ACN 
reported in fig. 6.16 were obtained by subtracting the contribution of the 
pure solvent. 


6.2.2. Computational Chemistry Details 
Car-Parrinello Molecular Dynamics Simulation 


I performed ab initio Molecular Dynamics simulation with the CPMD 
package [CPM] on a system made up of 32 ACN and 1 PDO molecules in 
the microcanonical (NVE) ensemble. I simulated the sample in a cubic box 
of edge length 14.6983 A (to match the room temperature density of pure 
ACN), with periodic boundary conditions. BLYP exchange and correla- 


Figure 6.14: Snapshot extracted from the CPMD trajectory of PDO surrounded 
by 32 ACN molecules. The total number of atoms is equal to the simulation of 
PDO in water. 


tion functional [Bec88, LYP88] was adopted, along with norm-conserving 
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Martins-Troullier pseudopotentials [TM91] and Kleinman-Bylander de- 
composition to describe core electrons. [KB82] Plane wave expansion of 
the electronic wavefunction was truncated at 85 Ry, while electronic ficti- 
tious mass was set to 400 atomic units (a.u.). The system was simulated 
for more than 26 picoseconds with a time-step of 4 a.u. (~ 0.096 femtosec- 
onds) after a thermalisation run of about ~ 3 picoseconds at 300 K by 
velocity rescaling. The average temperature during the main production 
run was 295 + 17 K. 


Classical Molecular Dynamics Simulation 


I performed classical molecular dynamics simulation with the FIST rou- 
tine included into the CP2K suite of programs [CP2]. To model the inter- 
actions of PDO I employed the force field used by Chelli et al. [CPC™99] 
to simulate glycerol. For ACN I used the AMBER force field. [CCB*95] 
The simulated system is the same of the CPMD run reported in the pre- 
vious paragraph. The classical simulation was 500 picoseconds long in 
the microcanonical (NVE) ensemble, with a time-step of 0.5 femtosec- 
onds. Ewald summation was used to model electrostatic interactions. ESP 
charges were obtained with Gaussian 09 [FTS*b] at the BLYP/6-31G* level 
of theory. The production run was preceded by a 25 picoseconds long 
thermalisation with velocity rescaling. 


Ab Initio Calculations 


In order to validate the computational approach employed in the CPMD 
simulation, I performed static ab initio calculations with the Gaussian09 
[FTStb] package at various levels of theory (BLYP [Bec88, LYP88], B3LYP 
[Bec88, LYP88, Bec93b, Bec93a] and MP2) on isolated ACN and on a cluster 
made up of one PDO and one ACN molecule. I would have liked to use 
Gaussian03 [FTS*a] like in the calculations of PDO and EG in water, but, 
unfortunately, with Gaussian03 the geometry optimisation never reached 
convergence at the MP2 level of theory. All Gaussian09 calculations were 
performed with the 6-311++G(d,p) basis set, adopting very tight conver- 
gence criteria for both wavefunctions and geometrical optimisation, with 
an ultrafine integral grid. The structural results are summarised in ta- 
ble 6.4. All the computed harmonic vibrational frequencies are real and 
positive, ensuring that the systems are in their respective equilibrium con- 
figurations. The CPMD optimisation was carried out adopting the same 
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theory level and system | rcn/Ä | rcc/Å | rcu/A | an=c-c/° | dip./D 


ACN BLYP * 1.159 | 1.461 | 1.096 180.0 3.97 
ACN BLYP # 1.164 | 1.463 | 1.099 179.9 4.01 
ACN B3LYP + 1.165 | 1.463 | 1.099 180.0 4.05 
ACN MP2 FF 1.174 | 1.463 | 1.092 180.0 4.30 


Cluster (ACN+PDO) BLYP * 1.157 | 1.459 | 1.096 179.8 
Cluster (ACN+PDO) BLYP + 1.163 | 1.461 | 1.099 179.8 
Cluster (ACN+PDO) B3LYP # | 1.151 | 1.454 | 1.092 179.8 
Cluster (ACN+PDO) MP2 # 1.172 | 1.461 | 1.091 179.8 


Table 6.4: Structural parameters of ACN obtained after geometry optimisation at 
various levels of theory. * Optimisation performed with the CPMD code [CPM]; 
t optimisation performed with the Gaussian 09 suite of programs [FTStb]. The 
rcp distance is the same for the 3 C—H bonds present in a ACN molecule. 


cell dimension and parameters of the simulation run. As table 6.4 shows, 
there are only minor differences between BLYP CPMD and Gaussian09 
calculations. 


6.2.3 Intra-molecular H-Bond of Propanediol 


The IR spectra of PDO as a pure liquid and in ACN solution are reported 
in fig. 6.15. The OH stretching band shows a more pronounced redshift 
in neat liquid PDO than in ACN solution, as a consequence of different 
hydrogen bond interactions. In ACN the band shows a well pronounced 
tail on the low frequency side. Fig. 6.16 shows that two components occur 
in both overtone bands. A deconvolution procedure employing pseudo- 
Voigt profile [OL77] (mixture of 90% Gaussian and 10% Lorentzian func- 
tions) points to the presence of two maxima located at ~ 6856 and ~ 7006 
cm! for the first overtone and at ~ 10119 and ~ 10320 cm! for the 
second one. Adopting the usual empirical formula for overtones derived 
from the Morse diatomic oscillator model [KPTH96], 


v/n = w — wx(n + 1) (6.1) 


where v is the actual frequency, w is the mechanical “harmonic” fre- 
quency, wx the anharmonicity and n the vibrational quantum number 
(which is 2 for the first overtone and 3 for the second one), I obtain that 


the two fundamental bands should occur at ~ 3483 and ~ 3566 cm-!. 
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— pure PDO 


Absorbance / arb. units 


| 1 | n | 1 | L ji 
3000 3200 3400 3600 3800 4000 
wavenumber / cm” 


— 
2800 


Figure 6.15: Experimental IR spectra of PDO in the CH and OH stretching region. 


We extended the same approach to the analysis of the fundamental band 
around 3500 cm™!; a tentative fitting, adopting the same Voigt profiles 
previously described, gave two components at ~ 3485 and ~ 3550 cm™!, 
as shown in fig. 6.17. The agreement with the data extrapolated from the 
overtone spectra is not complete (Av ~ 83 cm! from overtone calculation, 
Av ~ 65 cm"! from fundamental band fitting); the different frequency 
separation in the two cases (~ 83 cm! and ~ 65 cm!) can be reason- 
ably attributed to the different anharmonicity effects in fundamental and 
overtone vibrations. 

A spectral behaviour similar to what observed here for PDO in ACN 
has been reported for gas phase PDO. In the latter case, Cheng et al. 
[CCT11b] reported and calculated the IR spectra for the second and third 
OH stretching overtones, finding a good agreement with the experimen- 
tal data. They interpreted their results on the basis of intra-molecular H- 
bonding, associating the lower and the higher frequency bands to “bonded” 
and “free” conformations, respectively. Analogously, recent papers by 
Chen et al. [CCT11a] for 1,5-pentanediol, 1,6-hexanediol, and by Hazra 
et al. [HKS11] for peroxyacetic acid found evidences of intra-molecular 
H-bonds in the gas phase studying the overtone spectra. 

Ref. [GTK12] reports on a computational investigation of the intra- 
molecular bonding in 9-hydroxy-9-fluorenecarboxylic acid. In Ref. [CCT11a] 
the reported frequency difference between the two components of the sec- 
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Figure 6.16: Overtone spectra of PDO in ACN. Black lines are the experimental 
spectra, red and blue lines are the fitting bands and orange lines represent the 
total fitted spectra. 


ond overtone is Avp_3 = 310 cm!, of the same order of magnitude of the 
value (210 cm!) that I obtained from the fitting of the spectra in ACN 
solution. The occurrence of intra-molecular hydrogen bond was also re- 
cently reported in other similar systems, both from experimental results 
and from ab initio calculations [VFMC11, DPL11]. 


An intra-molecular H-bond is also present in our CPMD simulations 
of PDO in ACN, as well as an inter-molecular H-bond between one diol 
OH group and the nitrogen atom of a solvent molecule. 


A relevant aspect of the simulation trajectory in the phase space has to 
be stressed. The two OH groups of PDO are obviously chemically equiv- 
alent and, in the ergodic limit of an infinitely long simulation, should ex- 
hibit the same behaviour. In fact, during the simulation run both groups, 
labelled OH(1) and OH(2), form and break a H-bond with the CN group 
of the solvent molecules (see fig. 6.18B). On the contrary, of the two possi- 
ble intra-molecular H-bonded configurations, i.e. OH(1) donor to OH(2), 
and OH(2) donor to OH(1), only the first conformation happens to be re- 
alized in the 26 picoseconds run. As shown in fig. 6.18A, OH(1) forms 
a "bridge" with OH(2); the latter, in turn, may be involved as a donor in 
a H-bond to an ACN molecule. This "chained" H-bond configuration, il- 
lustrated in fig. 6.18A, is actually observed in the first ten picosecond of 
our simulation. This behaviour has to be ascribed to the limited length 
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Figure 6.17: OH stretching fundamental band of PDO in ACN. Black line is the 
experimental spectrum, red and blue lines are the fitting bands and orange line 
represents the total fitted spectrum. The two green and two violet points are 
the harmonic frequencies obtained from normal mode calculations performed 
with Gaussian 09 at BLYP/6-311++G(d,p) level of theory on two model system 
made of one PDO and two ACN molecules forming one intra and one inter- 
molecular H-bond (green points) or two inter-molecular H-bond (violet points); 
intensities are not in scale and are reported on the experimental spectrum for 
better visibility. 


of the simulation and to the fact that the sample contains only one PDO 
molecule. The resulting statistical limitation, indeed, does not allow us to 
make any quantitative estimate of the relative abundances of the different 
intra- and inter-molecular conformations. Nevertheless, it does not pre- 
vent us from pointing out the structure-frequency correlations that are the 
main objective of our work. Actually, the different roles played by OH(1) 
and OH(2) in the simulation allow treating the two groups as two differ- 
ent sites and to easily distinguish the contributions to the spectral features 
of the different structural realisations. 


In order to obtain an estimate of the statistical significance of the con- 
formational exchange resulting from the CPMD run, I also performed a 
classical MD simulation on the same system, extending the run to 500 ps 
(~ 20 times longer than the CPMD run). The classical trajectory shows a 
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Figure 6.18: Two snapshots extracted from the CPMD trajectory; panel (A) shows 
an intra-molecular H-bond between the OH(1) - - -OH(2) (green dotted line), and 
an inter-molecular H-bond between OH(2) and one acetonitrile molecule (red 
dotted line); panel (B) shows both OH groups engaged in inter-molecular H- 
bonds (red and blue dotted lines). 


total of 34 “switching events” involving both OH groups,! corresponding 
to an average switching rate per hydroxyl group of ~ 0.9 events in 25 ps, 
in qualitatively agreement with the CPMD simulation. This is displayed in 
fig. 6.19. However, I have to warn that adopting different choices for the 
force-field, different behaviours are found, suggesting, once again, that 
classical MD simulation results should be handled with much caution. 


lIn the case of this classical MD run, for the sake of simplicity, I have considered the 
two hydroxyl groups H-bonded if at least one of the two possible OH- - - OH distances is 
below the cutoff value of 2.5 A. 
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Figure 6.19: Intra-molecular H-bonds distances obtained from a classical MD run. 
There are ~ 34 H-bonding events. The region coloured in light red denotes the 
criteria adopted to consider PDO engaged into intra-molecular H-bonds. 


Having assessed the plausibility of the intra-molecular H-bond, I now 
go on with the discussion of the CPMD simulation. Also the unnor- 
malised and normalised pair radial distribution functions (h(r) and g(r), 
respectively) for the OH. --N interactions involving the two OH groups 
(see fig. 6.20) show the signature of the different behaviours of the two 
moieties. It is evident that OH(1) forms H-bond with the solvent less 
frequently than OH(2). The average bond length of O—H(1)((roxq1)) 
throughout the simulation run is ~ 0.983 A, whereas (ToH(2)) is ~ 0.994 
A. This suggests that OH(2) covalent bond is weaker than OH(1), in qual- 
itative agreement with the fact that OH(2) can be engaged simultaneously 
in H-bonding with two different partners (as an acceptor with OH(1) and 
as a donor with a ACN molecule). A more quantitative estimate can be 
extracted from the radial distribution functions of the OH. -- OH pairs of 
the PDO molecule, as reported in fig. 6.21. 4The maximum at about 2 
A for the OH(1)- - - OH(2) interaction (blue line) and its absence for the 
OH(2)- -- OH(1) pair (red line) show that the intra-molecular H-bond ac- 
tually occurs only in one direction, as shown in fig. 6.18A. 

This explains the differences seen in fig. 6.20: in fact, summing up the 
values for all the unnormalised h(r) functions for each OH sites, a very 
similar distribution is obtained. We also want to point out that a direct 
comparison between the radial distribution functions of fig. 6.20 and fig. 
6.21 should only be restricted to the h(r) functions, since the normalisation 
factors adopted for the two g(r) are different. In fact, for the two OH. --N 
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Figure 6.20: Radial distribution functions for the OH ---N pairs. Red and blue 
lines are associated to the two different OH groups of propanediol. OH(1) site in 
blue, OH(2) in red. 


g(r), the normalisation factor corresponds to all the possible OH(i)- -- N 
pairs, which are 32, whereas for the OH---OH g(r) the number of all 
possible OH(i)- -- OHG) pairs is 1. 


In order to extract the H-bond dynamics from MD simulations, I used 
the FHB function [PCRS03,PCS05, FPCS06,CPLRO9,PMMC*10,PMMC*11] 
already employed for PDO and EG in water. In the upper panel of fig. 
6.22 each horizontal line corresponds to a different solvent molecule (the 
molecular index); the blue and red bars represent the occurrence of solute- 
solvent inter-molecular H-bond, involving OH(1) and OH(2). In the lower 
panel the value of F™® for the intra-molecular bond is reported as a func- 
tion of time. The inter-molecular H-bond involving OH(2) is clearly more 
stable, whereas that of OH(1) has a very short life time resulting in an 
intermittent behaviour. The intra-molecular H-bond is present for the 
first 10 picoseconds of the simulation time and is rather stable. The dif- 
ferent solvation state of the two OH groups of PDO strongly affects the 
vibrational spectrum. The vibrational density of states (VDoS) is obtained 
from the CPMD trajectories as the Fourier transform of the atomic veloc- 
ity autocorrelation function. In fig. 6.23, upper panel, I show the overall 
VDoS of the simulated system (upper panel) and the VDoS of the so- 
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Figure 6.21: Radial distribution functions for the two OH---HO pairs. The 
OH(1)---OH(2) and OH(2)---OH(1) interactions are represented with blue 
and red lines, respectively. 


lute only, obtained by restricting the Fourier transform to the atoms of 
PDO. A more detailed analysis can be done by restricting the VDoS calcu- 
lation to the OH groups of PDO, which are the most affected by intra- 
and inter-molecular H-bonding. As shown in fig. 6.24, the VDoS of 
OH(2) is significantly redshifted with respect to that of OH(1). Quanti- 
fying this effect on the basis of Fourier analysis of the velocity autocor- 
relation function is rather difficult, as the resulting power spectra show a 
multitude of peaks in the OH stretching region (fig. 6.24, upper panel). 
WT analysis provides a convenient method of smoothing the calculated 
spectra [SLC03, Ehr02,SSZ* 06], so that the maxima of the two VDoS can 
be located with better accuracy; fig. 6.24, lower panel, shows the resulting 
VDoS amplitude, consisting of the time-integrated spectrogram plotted 
along the frequency axis. 


I have to point out that the wavelet code adopted to study the PDO/ ACN 
mixture (and all other research studies exposed hereafter) is based on the 
Fourier space implementation (see Section 4.2.3). 


The two main maxima are found at about ~ 3514 and ~ 3594 cm!, 


giving a frequency splitting Av ~ 80 cm!, to be compared to the value 
(~ 83 cm!) obtained from the overtones using equation 6.1. Therefore, 
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Figure 6.22: (upper panel) as a function of the ACN molecular index, a 
point is added if FB > 1073. Red points stand for OH(2)--- ACN interactions, 
whereas blue points stand for OH(1):--ACN interactions; (lower panel) F'8(t) 
for the intra-molecular H-bond, depicted as green line and points. 


CPMD simulation analysis confirms the results of the fitting of the exper- 
imental data and the extrapolated frequency splitting. Moreover, it sug- 
gests that the origin of the observed structured band in the OH stretching 
region is the different solvation dynamics and, in particular, the different 
H-bonding behaviour. This is analogous to what observed and calculated 
for PDO in the gas phase [CCT11b], with the remarkable difference that 
in the gas phase the system alternates between intra-molecular “bonded” 
and “free” conformations only, while in ACN solution there is, in ad- 
dition, the possibility of an inter-molecular H-bond which, according to 
our simulation, involves at least one hydroxyl group for most of the time. 
Since only OH(1) acts as intra-molecular H-bond donor group during our 
26 picoseconds long simulation (and, specifically, only for the first 10 pi- 
coseconds), it has been easy to separate the two “free” and “bonded” 
contributions. 


The free energy difference between the intra-molecular “bonded” and 
“free” conformations can give an idea of the relative populations of the 
two conformations and of their contributions to the total spectrum. Ta- 
ble 6.5 reports the results of BLYP/6-311++G(d,p) calculations done with 
Gaussian09 [FTS*b] (very tight convergence criteria for wavefunction and 
geometric optimisation, ultrafine integral grid) on two model systems, 
consisting of a PDO molecule surrounded by two ACN molecules with 
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Figure 6.23: VDoS obtained from Fourier transform of the velocity autocorrela- 
tion functions for the simulated system (upper panel) and for the solute only 
(lower panel). Spectra are uniformly scaled by a factor of 1.055. 


and without intra-molecular H-bonds, to estimate their relative difference 
in free energy, and thus their relative population in gas phase. The cal- 


PORE [POH [as zum" 


“intra” cluster 0.985 0.984 -43.8 -84.9 0 
“open” cluster 0.978 0.978 -81.0 -65.6 +1.02 


Table 6.5: Comparison of structural and energetic features of a cluster of PDO 
and 2 ACN molecules. PDO in the cluster is engaged in two donor H-bonds with 
the two ACN molecules (“open cluster”) or with just one due to intra-molecular 
H-bonds (“intra cluster”). Structural parameters obtained after geometry optimi- 
sation Gaussian 09 suite of programs [FTS*b] adopting the BLYP/6-311++G(d,p) 
level of theory. Angles ¢ OH(1) and p OH(2) are dihedral angles involving two 
carbon atoms of the PDO and OH(1) and OH(2), respectively. 


culated geometric parameters of the two systems are collected in the first 
four columns of table 6.5. Within the limits of this simplified model, the 
relatively small free energy difference at 300 K (1.02 kJ/mole) suggests 
that the two conformers can both be present at room temperature. These 
two possible conformations correspond to specific normal modes whose 
frequencies, obtained from Gaussian09 [FTS*b] calculations, are reported 
in table 6.6 and displayed in fig. 6.17 as green (“intra” cluster) and violet 
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Figure 6.24: VDoS extracted from the trajectories of the hydroxyl groups only. 
Spectra are uniformly scaled by a factor of 1.055. Upper panel: spectrum calcu- 
lated with Fourier Transform; lower panel: spectrum calculated by WT adopting 
the value of o = 27 for optimal smoothing. 


(“open” cluster) points. It is worth noting that the frequency separation 


“intra” cluster 3462 3490 
“open” cluster 3596 3598 


Table 6.6: Normal mode calculations performed with Gaussian09 [FTS*b] at 
BLYP/6-311++G(d,p) level of theory. Normal modes vı and v2 stand for the 
two OH stretching normal mode. 


between the two normal modes involving OH stretching is appreciable for 
the “intra” cluster, due to the different environments of the two hydroxyl 
groups. On the contrary, the frequencies for the “open” cluster are almost 
coincident and substantially higher than those of the intra-molecularly 
bonded cluster, since both OH are engaged in the same type of H-bond 
with ACN. The superposition of these two rather different spectral pro- 
files, corresponding to the coexistence of the two conformers in the room 
temperature solution, is fully compatible with the broad and asymmetric 
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shape of the experimental spectrum (see fig. 6.17) 


The correlation between H-bonding and O-H stretching frequency 
can be appreciated further exploiting the time-frequency analysis pro- 
vided by wavelets, along with the F™® function. For the analysis reported 
in fig. 6.25 I adopted a different choice for the g, and og parameters defin- 
ing the FB function according to equations 3.9 and 3.10, in order to ap- 
preciate also minor changes in the solvation environment. In particular, I 
put c, equal to 2 and gẹ to 10.0 degrees, twice its usual value (see Section 
3.1.3 for details). The contribution of the intra-molecular H-bond function 
(see fig. 6.22, lower panel) is added to OH(1) and OH(2) inter-molecular 
FHB functions, since both groups are involved in the bond. The correlation 
of the instantaneous frequency to the solvation state can be appreciated in 
fig. 6.25, where I report the opposite of FHP and the H-bond function as a 
function of the simulation time. 
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Figure 6.25: a) Hydrogen-bond function (changed in sign) —F HB as a function of 
time for OH(2) site; b) “instantaneous” O-H(2) stretching frequency obtained by 
WT with o = 2; c) —F"P as a function of time for OH(1) site; d) “instantaneous” 
O-H(1) stretching frequency obtained by WT with ø = 2. Gray lines correspond 
to the actual values found during the simulation; the red and blue lines, drawn 
as a guide for the eye, were obtained by a 10" order polynomial fit of the raw 
datasets. The frequencies are uniformly scaled by a factor of 1.055. 
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The results summarised in fig. 6.25 are generally consistent with those 
shown in fig. 6.22 and 6.24: the broader frequency range spanned by 
VDoS of OH(2) and the lower value of the mean frequency correspond 
to the larger values (> 2) attained by F® for this group, which is en- 
gaged simultaneously in intra- and inter-molecular H-bonds for the first 
10 picoseconds of the simulation. It is of some interest to correlate the 
instantaneous OH frequencies to the shortest OH---N distance at every 
simulation time-step as done for the diols in water. The results of this anal- 
ysis performed for PDO in ACN are shown in fig. 6.26. The spectrograms 
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Figure 6.26: Correlation spectrograms between the OH stretching frequency and 
lowest OH - -- N distance; a) OH(1); b) OH(2). Frequencies are uniformly scaled 
by a factor of 1.055. WT parameter o has been put equal to 2. 


have the usual banana shape; as expected, the OH(1) VDoS is distributed 
over a larger interval of OH --- N distance, and it thickens also out of the 
bonding region (i.e. for roH..n > 2.5 A). In contrast, the distribution of 
OH(2) frequencies is redshifted and it is practically all included in the 
bonding region. 


In conclusion, the peculiar spectral features in the region of the OH 
stretching vibrations were interpreted as due to the presence of concurring 
intra- and inter-molecular hydrogen bonds, with the acceptor hydroxyl 
stretching frequency more redshifted that of the donor. The dynamics of 
acetonitrile in the presence of the glycol solute can be seen as a case of 
“chemical exchange” widely studied by means of 2D IR spectroscopy. 
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6.2.4 ... and What About the Nitrile Stretch Blueshift? 


The dynamical response of the C=N group of ACN to H-bond formation 
is completely opposite to that of the hydroxyl group of PDO. In general, 
the occurrence of H-bond causes a redshift of the OH stretching frequency. 
The presence of an intra-molecular H-bond (as in the case of PDO in ACN) 
can make the interpretation of the spectra more complicated, but the over- 
all effect on the OH frequency is well established. The C=N stretching 
band, conversely, is blueshifted when nitrile is engaged as acceptor of H- 
bond. Experimental works by Kim and Hochstrasser reported this some- 
what unusual feature for several different and heterogeneous nitrile mix- 
tures (ACN in methanol [KH05, KH09]; benzonitrile, cinnamonitrile and 
HIV-1 RT/TMC278 complex, which contains a cyanovinyl nitrile group, in 
aqueous solution [KH09]) by 2D IR experiments. In particular, for ACN in 
methanol they observed a Av separation of ~ 8 cm! between “free” and 
H-bonded CN stretching frequency [KH05]. This behaviour can also be 
ascertained by measuring the Raman spectra of ACN in various solvents, 
as reported in fig. 6.27 (the small peak at about ~ 2300 cm! is just a 
combination band of CC stretch and CH bend of ACN [KH05]). 
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Figure 6.27: Experimental Raman spectra of ACN in various solvent. 
In non H-bonding solvent (CCh) the C=N stretching frequency is 


about ~ 6 cm! lower than in water. The same could be said for pure 
ACN, whose spectral response is overall similar to that of PDO in CCly. 
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Even more interesting, using PDO as a solvent I observe a somewhat in- 
termediate behaviour, with a main maximum at 2256 cm! but at the 
same time a clear shoulder at about ~ 2262 cm~!. The shoulder becomes 


more evident with the increase of the [ACN] molar ratio. This experi- 


mental result suggests that, in the presence of PDO, the ACN molecule 
gives rise to a sort of “chemical exchange” like that observed by Kim and 
Hochstrasser in References [KH05] and [KH09] for ACN in methanol. The 
experimental conditions of these measurements (ACN in PDO solvent) are 
different from those of our simulations (PDO in ACN solvent), but the dy- 
namic response of the C=N stretching when involved in the H-bond can 
be assumed to be comparable. The frequency calculations for free and 
H-bonded ACN, performed with the DFT and MP2 approach (see Sec- 
tion 6.2.2), provide a reliable support to the interpretation of the shoulder 
clearly visible at 2262 cm“! in the C=N stretching band of ACN peaked 
at 2256 em. 

The BLYP calculation predicts for the C=N stretching mode the fre- 
quency of 2262 cm! for unbound ACN, and of 2276 cm”! for ACN bound 
to PDO, with a frequency shift of Av = 14 cm=!. Very similar results are 
obtained from B3LYP and MP2 calculations, as shown in table 6.7. Al- 
though the frequency difference is larger than the experimental value, it 
is remarkable that all the calculations predict, in agreement with the ex- 
periment, an upshift of the C=N frequency when the group is engaged in 
H-bonding. 

In other words, both experiments and calculations point to an increase 
of the strength of the C=N bond when ACN is H-bonded. 


a 


Avc=n | 14cem7! | 13 cm! | 15 cm~! 
Arc=n | 0.0018 A | 0.0015Ä | 0.0014A 


Table 6.7: Difference between the C = N stretch frequency of H-bonded and 
free ACN at various levels of theory, performed with the Gaussian09 suite of 
programs. [FTS*b] 


A confirmation of this picture comes from the analysis of the electron 
density obtained by calculating the Maximally Localised Wannier Func- 
tions (MLWF) centres [MV97,SMVP98] (see Section 3.2). In particular, I 
extracted two configurations from the simulated trajectory and performed 
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static ab initio calculations of the MLWF using the CPMD code [BMR*00]. 
The positions of these centres represent the most probable locations where 
to find electron pairs. The distance between the nitrogen electron lone pair 
and the three bonding electron pairs of the triple C=N bond increases by 
Aree ~ 0.01 A when nitrogen is engaged as an acceptor in H-bond with 
the hydroxyl groups of PDO. [Wea79] This difference is certainly small 
in absolute terms, but the frequency splitting of the vibrational band is 
small too. We notice that this difference is consistent with the results of 
the Gaussian 09 calculations (irrespective of the level of theory) shown 
in table 6.4: the distance between C and N atoms is slightly reduced 
when ACN acts as H-bond acceptor from PDO. To check the reliability 
of our calculation, I have also computed, using CPMD, the dipole mo- 
ment of bonded and free ACN: I found a value of 3.97 D for isolated 
ACN and of 5.01 D for hydrogen-bonded ACN: the isolated ACN value is 
close to the experimental value of 3.91 D measured in the gas phase. We 
used the CPMD program to calculate also the Electron Localisation Func- 
tions [BE90] (ELFs) (see Section 3.2), collected in fig. 6.28, to characterise 
the electron density in the region of the C=N bond. The figure shows that 
a change in the shape of the electronic probability density decreases on 
the nitrogen atom upon formation of the PDO-ACN hydrogen bond. 


bonded ACN 


free ACN 


Figure 6.28: Contour lines of the ELF along the plane passing through the N, C, 
C atoms of ACN, for free and isolated ACN (left) and with the environment and 
PDO (right). 


Both MLWE centres and ELFs suggest that some charge spreading ac- 
tually occurs along the N---OH line, mainly due to the electron den- 
sity displacement from the nitrogen lone pair toward the PDO molecule. 
Consequently, a decreased electronic repulsion results between the three 
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Figure 6.29: Difference between the electron density distributions of free and hy- 
drogen bonded ACN, along the plane passing through the Cacn, Nacn, Be 


atoms. 


bonding pairs of the C=N triple bond. A support to our interpretation 
is obtained by taking the difference between the electron density distribu- 
tions calculated for free and hydrogen bonded ACN, reported in fig.6.29. 
The blue regions denote an increase in the electron density, whereas the 
red ones correspond to a decrease. A charge transfer from the N atom 
toward the hydroxyl group evidently takes place, and the electron den- 
sity along the nitrile triple bond increases due to the reduced repulsive 
interaction. This effect corresponds to the strengthening of the C=N bond 
force constant, resulting in the upshift of the corresponding vibrational 
frequency. 


Recovering the C=N Stretch Blueshift with CPMD 


My CPMD simulation does not reproduce the small frequency blueshift of 
the C=N band: as a matter of fact, weighting the VDoS by F® (t) function 
and its complementary as we have done in Ref. [PMMC*10] for methyl 
acetate in methanol (see following Chapter 7), “free” and H-bonded con- 
tributions to the VDoS appears at about the same frequency of ~ 2260 
cm™!. To find the reasons of this inability to reproduce the spectral fea- 
tures of ACN, many possibilities have been taken into account. 

In Ref. [AL07] and [TJM *05], effects due to the formation of “head-to- 
tail” complexes (mainly due to improper H-bonds) of ACN molecules are 
included in the C=N stretching elucidation. However, the formation of 
such complexes in my CPMD simulation is rather sporadic and apparently 
uncorrelated to the instantaneous C=N stretching frequency as calculated 
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by WT. 

To check the influence of the exchange-correlation functional adopted, 
I performed another simulation (hereafter called PBE!), adopting the same 
procedure and parameters described in Section 6.2.2, albeit employing 
PBE exchange-correlation functional [PBE96], because recently Thanthiri- 
watte et al. [THBS11] found that this choice appears to lead to better re- 
sults in H-bonded systems. Simulation PBE! reproduces the same struc- 
tural and spectroscopic results discussed above: the intra-molecular HB is 
formed and the spectral shift for OH(2) is recovered, but a blueshift for 
C=N stretch is still not observed. 

Having asserted that this inconsistency probably is not functional- 
specific for PBE nor BLYP, 2I performed a third CPMD simulation (simu- 
lation PBE") adopting the same procedure and parameters of simulation 
PBE!, but this time thermalised at a temperature of 180 + 15 K. At this 
much lower temperature, H-bonds no longer are broken nor formed. PDO 
only interacts with the same two ACN molecules and no intra-molecular 
H-bond is formed (the starting configuration that we have adopted did 
not have intra-molecular H-bond as for the other two simulations). The 
two inter-molecular H-bonds are stable for the entire simulation run, as 
summarily described by fig. 6.30. As a result, the VDoS of the two 
hydroxyl groups are completely overlapped, because they display the 
same H-bonding behaviour. Comparing the VDoS of the 30 “free” ACN 
molecules against the two H-bonded ones, a blueshift of about ~ 8 cm! 
is appreciable, as reported in fig. 6.31, while the main features of the three 
Car-Parrinello simulations are summarised in table 6.8. I performed fur- 


intra-molecular HB of PDO | blueshift of H-bonded CN groups 


—_ ar 
PBE! | 300K 
PBE! | 180K 


Table 6.8: Summary of the three CPMD simulations performed. 


ther analysis with WT, adopting the same procedure discussed for PDO, 
and the resulting spectrogram for the two H-bonded ACN molecules is 


2This could also be inferred from the static ab initio calculation reported in the previous 
section, which were able to reproduce the blueshift using the BLYP functional 
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Figure 6.30: FHB as a function of the ACN molecular index for PBE! simulation, 
a point is added if FHB > 1073. Red and blue points stand for the two different 
OH groups. 


displayed in fig. 6.32. It can be appreciated that a “banana shaped” corre- 
lation is present, but inverted with respect to that of fig. 6.26: the decrease 
of the N--- HO length is associated to a shift toward higher frequencies, 
recovering the known experimental behaviour. Thus, our results is that 
CPMD can, from a theoretical point of view, reproduce this unusual spec- 
tral feature. On the other hand, longer H-bond lifetimes are required to 
appreciate difference in frequency of just ~ 4 cm~!. Since Av = 1/2Te, 
with Av the frequency separation (in cm-!), T the H-bond lifetime (in 
seconds) and c the speed of light in vacuum (in meters/seconds), a ~16 
picoseconds long lifetime is required to correctly separate two peaks due 
to different H-bonding environments. 


This can explain why both simulations performed at 300 K (BLYP and 
PBE!) failed to recover the ACN blueshift, whereas simulation PBE! at 
180 K gives a frequency-distance correlation that is opposite to that of the 
OH stretching, as expected. Moreover, stochastic structural fluctuations 
in the solvation cage can induce frequency fluctuations large enough to 
hide (or compensate) the small blueshift induced by hydrogen bonding: 
a more stable H-bond ensures that these fluctuations have only minor 
impact onto the vibrational spectra. 


I need to point out that the thermodynamic conditions of simulation 
PBE! are obviously different from those of the experimental measure- 
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Figure 6.31: Calculated VDoS of the 30 free (upper panel) and the 2 H-bonded 
(lower panel) ACN molecules for the PBE! simulation. Frequencies are uni- 
formly scaled by a factor of 1.033. A Lorentzian smoothing has been adopted. 


ments. In particular, at 180 K, acetonitrile should be a solid and therefore 
our conclusion regarding the blueshift recovery with CPMD have to be 
carefully considered: we have elucidated the causes of the incorrect vibra- 
tional response at room temperature performing a simulation at a lower 
temperature and in a different phase. Anyway, the idea of “freezing” the 
system is not so unusual in the field of MD simulations, as shown by re- 
cent works of Nicolini et al. [NC09, NFC11]. In these two latter references, 
the “freezing” of (part of) the molecular system was exploited to better 
sample the conformational space of a large biomolecule, whereas in my 
case I resorted to a low temperature simulation to slow the dynamics for 
a better sorting out of the effects affecting the C=N stretching. 
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Figure 6.32: Correlation spectrogram between the CN stretching frequency and 
lowest N --- HO distance. Frequencies are uniformly scaled by a factor of 1.033. 
WT parameter g has been put equal to 2. 
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7. Methyl Acetate in Protic Solvents 


Methyl acetate (MA) has been subject to detailed study at LENS labo- 
ratories for ultrafast spectroscopy. MA is a molecule simple enough to 
work as a target for high accuracy experiments, and at the same time ex- 
hibits many features in common with biologically active molecules, such 
as amino acids. 

The solvation dynamics of MA in heavy water was probed by two- 
dimensional IR experiments by Dr. M. Candelaresi (then at LENS, Univ. 
of Florence, now at the Univ. of Glasgow, UK) and by CPMD simulations 
performed by Dr. M. Pagliai (Chemistry Dept., Univ. of Florence) during 
year 2009 [CPLRO9]. The C=O stretching IR band of MA splits into a dou- 
blet in water as a consequence of the H-bond interaction with the solvent, 
which leads to the equilibrium between two solvated species, consisting 
of one MA molecule bonded to one and two water molecules [CPLRO9]. 
The combined approach consisting of 2D IR experiments and CPMD sim- 
ulation proved useful to understand what happens to MA in water and 
how the environment affects the C=O stretching mode, just falling short 
to reconstruct the IR spectrum shape. In fact, on year 2009, wavelets were 
not yet in use at LENS nor at the Chemistry Dept. of Florence to perform 
time-frequency analysis. 

I applied wavelets to the trajectories of MA in water and methanol 
provided to me by Dr. Pagliai to perform this last step of interpretation, 
allowing to immediately associate VDoS to a specific solute-environment 
interaction. 

As mentioned above, the vibrational spectrum of MA changes signif- 
icantly due to the type of solvent, in particular in the region associated 
to the C=O stretching [CPLRO09], indicatively at ~ 1700 cm~!. For exam- 
ple, in non-protic solvents like CCl, and ACN, the C=O stretching mode 
gives rise to just a single band in the IR spectrum, and the frequency in 
acetonitrile is lower than in CCly. The interactions in protic solvents are 
much more complex, because the solvents can interact with MA through 
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H-bonds. The number of solvent molecules interacting through H-bonds 
in methanol and water is different, and more caution to approach the wa- 
ter case is required. Thus, since MA in water poses more computational 
challenges than MA in methanol, the latter shall be presented before the 
former. 


7.1 Methyl Acetate in Methanol 


Performing sub-picoseconds IR experiments, Banno et al. [BOY *09, BOT08] 
observed that the C=O stretching band of MA in methanol splits into a 
doublet, like Candelaresi et al. found in aqueous solvent [CPLR09], with 
an average stretching frequency for this vibration higher than in water. 
Banno et al. [BOY 09, BOT08] explained this behaviour by assuming an 
equilibrium between two molecular forms, corresponding to MA involved 
alternatively in zero and one H-bond with the solvent, interpreting the 
presence of a shoulder centred at 1706 cm”! as a proof of MA interacting 
with two methanol molecules. 


7.1.1 Computational Chemistry Details 


The simulations were performed on a system made up by 1 MA and 64 
methanol molecules in a periodic cubic box with sides 16.3095 A. BLYP 
exchange and correlation functional [Bec88, LYP88] was employed along 
with norm-conserving Martins-Troullier [TM91] pseudopotentials within 
the Kleinman-Bylander [KB82] decomposition. The plane wave expansion 
was truncated at 85 Ry, while the fictitious electronic mass was 600 a.u.. 
The system has been thermalised at 300 K for ~2.6 picoseconds by velocity 
scaling. The simulation (in the NVE ensemble) was ~ 20 picoseconds long, 
saving the coordinates at each integration time-step of 5 a.u. (~ 0.12 fs). 
The average temperature was 315 + 10 K. 


7.1.2 Structural Data and Spectrum Interpretation 


To elucidate the H-bond interaction, the most important structural pa- 
rameters for the CPMD simulation of MA in methanol are summarised 
in fig 7.2. In fig. 7.2 we reported the pair radial and angular distribution 
functions for MA in methanol (panels a and b) and in water (panels c 
and d), taken as a reference (the simulation of MA in water has been per- 
formed with the same computational approach [CPLRO9]). In particular, 
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Figure 7.1: Snapshot of the simulation box of MA in CD3OD. 
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Figure 7.2: Panel a: pair radial distribution function (full line) and running inte- 
gration number (dashed line) for MA in methanol; panel c: the same for MA in 
water. Panel b: angular distribution function for MA in methanol; panel d: the 
same for MA in water. 


165 


Modelling of spectroscopic and structural properties using molecular dynamics 


the a and c panels show the radial distribution functions for the O---H 
distance, whereas panels b and d show the distribution function of the 
HO. - - O angles limited to the H-bond interactions. 


By comparison of the g(r) functions (fig. 7.2 a and c) for MA in the 
two solvents, some peculiar differences in the H-bond region can be noted, 
related to the different number of solvent molecules that directly interact 
with the MA molecule (the coordination number, n(r) in fig. 7.2 a and c). 
Analysing the g(r) up to the first minimum (fig. 7.2 a and c), it is possible 
to correlate the structural parameters with the possibility of one solvent 
molecule to leave or enter the first solvation shell. 


Since the accuracy of the g(r) was challenged by the referees, we 
checked it performing subaverages, without observing differences in the 
positions of the first maximum and minimum. The H-bond evolution dif- 
fers in time in the two subaverages, inducing slightly changes in the first 
peak height and in the value of the minimum deep in the g(r). This is a 
check that basically none does as long as distribution functions are rea- 
sonable enough, and was asked to us to verify that our simulation reached 
an ergodic enough behaviour to have statistical meaning. 


For the methanol solution the minimum of g(r) at about 2.8 Ä is much 
lower than in the water solution and, correspondently, the height of the 
first peak is more than twice higher. These two observations indicate 
that the solvent molecules in the first solvation shell are more localised 
in methanol than in water. In other words, the exchange rate of solvent 
molecules between the first and second solvation shell is definitely lower 
in the alcoholic solution. Thus, the H-bond formed by MA with methanol 
is then more stable and probably stronger than with water. As a conse- 
quence of the higher stability of the H-bond interaction, the angular distri- 
bution function is sharper in methanol than in water, as can be observed 
in fig. 7.2 b and d. 


To further clarify this, Dr. Pagliai considered two simple models, con- 
sisting of a MA and 1 or 2 solvent molecules. The geometry of the clusters 
were optimised with DFT calculations at the BLYP/6-311++G(d,p) level of 
theory, adopting the Gaussian03 suite of programs [FTS*a]. Fig. 7.4 shows 
the calculated IR spectra of the two clusters in the C=O stretching region 
and in fig. 7.3 the IR spectra of MA in methanol and water are reported 
for comparison. A comparison with the calculated spectra of analogous 
clusters with water molecules is presented. It can be seen that the redshift 
of the IR band is directly correlated with the number of hydrogen bond 
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Figure 7.3: Experimental IR spectra of MA in methanol and water. 


formed. In the case of MA forming one H-bond two bands are present 
at slightly different frequencies, corresponding to two distinct minimum 
structures, with the methanol or water molecule located on opposite sides 
with respect to the C=O bond. 

For an accurate description of the H-bond dynamics of MA in methanol, 
we adopted for the analysis of our CPMD simulation the hydrogen bond 
function F"P (see Section 3.1.3). The prs function assumes values close 
to 1 for strong H-bond interactions, whereas goes rapidly to 0 when the 
H-bond becomes weaker. 

The combined analysis of the F”? function with the instantaneous fre- 
quency computed by means of the WT provide a more descriptive visuali- 
sation of the H-bond effect on the vibrational frequencies. The time series 
analysed is represented by the variation of the C=O bond length dur- 
ing the CPMD simulation, thus allowing us to compute the frequency of 
the C=O stretching vibration at each time-step of the simulation. Please 
notice that this is an intra-molecular parameter, whereas WT has been 
adopted in the previous Chapter to correlate frequency with an inter- 
molecular distance. 

The wavelet analysis of the C=O stretching vibrational mode allows 
obtaining more information compared to the standard Fourier analysis, 
and can be of valuable help in the spectroscopic and structural character- 
isation of the H-bond of MA in methanol solution. In fact, as can be ap- 
preciated in fig. 7.5, FFT reconstructed VDoS of the C=O stretch presents 
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Figure 7.4: Calculated IR spectra for MA and clusters of MA with 1 or 2 methanol 
(upper panel) and water (lower panel) molecules in the C=O vibrational stretch- 
ing region. The frequencies have been uniformly scaled by a factor 1.021. Blue, 
red and green line correspond to MA involved in 0, 1 and 2 H-bond, respectively. 
The two red bands refer to solvent/MA clusters of different geometry. The cal- 
culated spectra have been reported by assigning to each C=O stretching mode a 
Lorentzian shape with 10 cm! full width at half-maximum. 


a lot of maxima and minima and is of very little help to understand the 
H-bond dynamics. 


Fig. 7.6 shows the analysis performed with wavelet transforms in con- 
junction with the FH? function. The result gives a clear description of 
the response of the C=O stretching mode to the variation of the H-bond 
strength. The presence of the H-bond interaction can be appreciated from 
the FB function reported in panel a of fig. 7.6. It can be seen that a 
hydrogen bond is present only in the first ~ 4 picoseconds of the simu- 
lation with molecule 58 of the solvent and after ~ 12 picoseconds with 
molecule 2. The behaviour of the FFB function (EB and Fue ) shows the 
importance of a continuous function for characterising the H-bond, like 
FB, that accounts for the fast oscillation in and out of the geometrical 
constraints usually taken as a probe of the formation of a H-bond. In fact 
(see panel b of fig. 7.6), as long as the H-bonded molecule resides in the 
first solvation shell, the fast modulation of the C=O stretching frequency 
takes place around 1719 cm! (red dashed horizontal line). Instead, when 
no methanol molecule is H-bonded with MA, the C=O frequency oc- 
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Figure 7.5: Power spectrum of MA in methanol obtained by FFT. Frequencies are 
uniformly scaled by a factor of 1.095. 


curs at higher wavenumber 1750 cm! (blue dashed line). The interac- 
tion with the solvent molecules induces a redshift in the C=O stretching 
frequency of about 30 cm™!, a value slightly higher than experimentally 
observed [BOY* 09, BOTO8]. 


WT allows to visualise this feature extracting “time resolved spectra” 
from the time-series of the C=O distances, as reported in fig. 7.7. Al- 
though no double H-bond configurations have been observed during the 
simulation, in some cases a second molecule approaches MA entering 
the first solvation shell region. This corresponds to the low frequency 
shoulder observed in the experimental spectrum. The C=O stretching 
frequency distribution can be appreciated in panel c of fig. 7.6, showing 
a bandwidth close to that measured in infrared spectra [BOY +09, BOTO8]. 
An attempt to calculate the H-bond life time has been performed by com- 
puting the correlation function of FH? obtaining an estimate of the or- 
der of 1 picoseconds which compares with the experimental relaxation 
time. [BOT08, BOY* 09] We have applied other methods to compute the 
H-bond life time according to the definition given in Ref. [PCRS03]; the 
order of magnitude of the life-time does not change ranging in the values 
between ~ 0.5 and ~ 2 picoseconds. 

An alternative approach to correlate the results obtained from the FH? 
function with the vibrational frequencies of the C=O stretching mode of 
MA can be accomplished taking advantage of the time localisation capa- 
bility of wavelet transforms. Actually, by this technique it is possible to 
extract at each time-step of the simulation the C=O bond length and the 
corresponding vibrational frequency. 


Thus, we can draw a two-dimensional plot in which the distribution 
of the instantaneous frequencies is expressed in terms of a specific struc- 
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Figure 7.6: Panel a: Evolution of F*P for the molecule with j=2 and 58. Molecule 
with j=58 forms an H-bond in the first part of the simulation, whereas molecule 
with j=2 interacts with MA in the last part of the simulation. Panel b: Time 
evolution of the maximum of the C=O stretching band during the simulation. 
The dashed lines represent the average frequency of C=O stretching mode in 
presence (red) or absence (blue) of H-bond. Panel c: C=O stretching frequency 
distribution for MA during the simulation. The red and blue bars indicate the 
frequency distribution of MA involved and not involved in H-bond interaction, 
respectively. The frequencies have been uniformly scaled by a factor 1.095 


tural parameter. Fig. 7.8a shows the probability distribution of the C=O 
stretching frequency as a function of C=O bond length. As expected, the 
surface is characterised by the presence of two main regions, with the 
maxima separated by ~ 30 cm“!, corresponding to MA engaged or not 
engaged in the H-bond interaction. It is interesting to observe that the 
shape of the two regions is quite different: this allows to pinpoint the 
correspondence between structural changes and frequency values. The 
presence of the H-bond not only leads to a redshift of the C=O stretch 
frequency, but also makes the C=O bond to span a larger length range. 
This behaviour can be better appreciated in figures 7.8b and 7.8c, where a 
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Figure 7.7: Wavelet spectra of MA in methanol as a function of time. 


clear separation of the two spectral regions is obtained by weighting the 
probability distribution by FH? and (1-FP), respectively. 

The possibility to localise the vibrational modes in time and frequency 
domain by wavelet transform makes molecular dynamics simulations par- 
ticularly effective in the study of H-bond dynamics and helpful in the 
comprehension of the experimental results obtained with time-resolved 
spectroscopies. 
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Figure 7.8: a) Probability distribution of the C=O stretching mode as a function 
of the C=O bond length. The b) and c) surfaces represent the probability dis- 
tribution given in a), weighted by FB and (1 — F"P) factor, respectively. The 
frequencies have been uniformly scaled by a factor 1.095 
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7.2 Methyl Acetate in Water 


Simulation Details 


The computational approach adopted is completely analogous to that em- 
ployed in Ref. [CPLRO09] and for studying MA in methanol, except for the 
simulation length (increased to 20 picoseconds, as for MA in CD3OD), the 
plane waves cutoff (increased to 85 Ry), and the simulation cell size (the 
side is smaller, 12.6694 A). 


C-Or 


Figure 7.9: Snapshot of the simulation box of MA in D20. 


7.2.1 Spectral Interpretation 


Interpreting the vibrational spectrum of the C=O stretch band of MA in 
water is actually more difficult than in methanol. In methanol, there is a 
“chemical exchange” between MA and an H-bonded cluster MA- - - CD3OD. 
In water, the “chemical exchange” is between the H-bonded cluster MA- - - D20 
and the double H-bonded cluster MA--- 2 x D20. In fact, the O atom 
of the C=O group of MA can act as H-bond acceptor with two H-bond 
donor molecules, and, while we have not seen this occurring in methanol, 
it happens in water. 

This has deep consequences onto the vibrational spectrum, as can be 
appreciated in fig. 7.10. It can be seen that a doublet is prominent in both 
solvents. Deconvolution of the experimental spectra shows that in both 
cases a third, much weaker component is present; the three bands have 
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Figure 7.10: Experimental IR spectrum (black line) of MA in methanol (top) and 
water (bottom). The blue, green and red lines refer t0 MA H-bonded to 0, 1, 
and 2 solvent molecules, respectively. The orange dashed line represents to the 
resulting fitted spectrum. 


rather similar frequencies in water and methanol solutions, while their 
relative intensities are remarkably different. As widely discussed in Refs. 
[BOT08, BOYT 09, CPLR09], the three bands are attributed, in decreasing 
frequency order, to MA H-bonded to zero, one and two solvent molecules. 
The relative intensities of the bands show that MA in water is, in large 
prevalence, bonded to one or two solvent molecules, while in methanol 
most MA molecules form zero or one H-bond. This is summarised in 
table 7.1. 


number £ H-bonds ER 


1748 cm” 1745 cm” * (weak) 


1730 cm7 1727 cm” 
1711 cm! (weak) 1704 cm7! 


Table 7.1: Observed frequencies for MA in methanol and water, respectively. I 
have obtained the frequencies by a fitting procedure adopting a combined Gaus- 
sian (90%) and Lorentzian (10%) functions. 


The different solvation number also affects the distance/frequency 
spectrogram for the C=O stretching, reported in fig. 7.11. 
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a) methyl acetate in CD,OD b) methyl acetate in D,O 
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Figure 7.11: Probability distribution of the C=O stretching mode for MA in (a) 


methanol and (b) water, as a function of C=O bond length. A o value of 47 has 
been used. 
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Figure 7.12: Probability distribution of the C=O stretching mode for MA in 


water interacting with (a) 0, (b) 1, and (c) 2 water molecules, as a function of 
C=O bond length. A o value of 47 has been used. 


The frequency distribution is significantly more complicated in water 
than in methanol, and in order to separate the various contribution the 
FHB function has to be once again adopted. 


A detail of the probability distribution for three different solute-solvent 
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interactions of MA, H-bonded with 0, 1 and 2 water molecules, is shown 
in fig. 7.12. These isosurfaces have been obtained starting from fig. 7.11, 
panel b, counting the configurations with values of FH? lower than 1074 
(fig. 7.12, panel a) or with one or two values of F#® greater than 10~ (fig. 
7.12, panels b and c, respectively); the overall pattern is in agreement with 
the spectral interpretation given above. 


7.3 Discussion 


This research showed that wavelets can be used to correlate the stretch 
frequency of a localised normal mode (in these cases, the C=O stretch) 
with an inter-molecular structural property (the bond length). The WT 
allowed to split the frequency doublets into components due to specific H- 
bonding arrangements, whereas standard Fourier analysis was not even 
able to show us a clear doublet (see fig. 7.10). 

The case of MA in methanol is particularly lucky, because in that sol- 
vent MA forms alternatively just 1 or 0 H-bonds: it is a sort of binary 
choice, and wavelets completely elucidated the solvation behaviour. In 
water it is more difficult to sort the VDoS into the various H-bonding con- 
tribution, and the resulting spectrograms (fig. 7.12) require more caution 
to be interpreted, but the agreement is still qualitatively very good. 

Remembering that MA was originally chosen as a target for 2D IR 
experiments due to its similarity with amino acids, this wavelet-based 
approach has great potential to be extended to biomolecules. 
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8. Water Solvation of Thiazole 


The adsorption of molecules on metal surfaces is useful to understand re- 
actions of heterogeneous catalysis or electroactivation in electrochemical 
cells. These processes can be probed and studied by surface-enhanced 
Raman scattering (SERS), owing to the giant enhancement of the Raman 
signal of adsorbates on high-reflectivity metals like silver, gold and cop- 
per. 

Among SERS substrates, silver surfaces are the most efficient for en- 
hancing the Raman signals of adsorbates, since for this metal high elec- 
tromagnetic enhancements are expected by laser excitation in the overall 
visible region. In particular, water dispersions of silver nanoparticles offer 
important advantages with respect to other SERS platforms, such as thin 
films, island crystalline surfaces or other substrates roughened by chemi- 
cal or electrochemical treatments. 

However, chemisorption in metal hydrosols is a quite complex process, 
which involves the solvation of molecules in the aqueous environment, 
the ligand affinity to the metal, and the presence of active-sites (adatoms 
or adclusters) at the surface of the metal nanoparticles. Hence, a good 
understanding of the interaction of the adsorbates with water is relevant 
to understand the competition between solvation and adsorption on a 
metal surface. 

It has been found that thiazole adsorbed on aggregated Ag particles 
gives a strong SERS response mainly due to chemisorption [MMPMMS11], 
thus the dynamics of thiazole in water has been approached by Dr. Pagliai 
using CPMD calculations, while I have analysed the structural and vibra- 
tional features of the simulated trajectory. 


8.1 Ab Initio Molecular Dynamics Details 


Ab initio MD simulation have been performed with the CPMD package 
[CPM] on a system made up of 64 heavy water and 1 thiazole molecules in 


Francesco Muniz Miranda, Modelling of spectroscopic and structural properties using molecular dynamics 
ISBN 978-88-6655-690-9 (online), CC BY 4.0, 2014 Firenze University Press 


Modelling of spectroscopic and structural properties using molecular dynamics 


Figure 8.1: Snapshot of the simulation box adopted to simulate thiazole in water. 


the NVE ensemble. The sample has been simulated in a cubic box of edge 
length 14.3683 A, with periodic boundary conditions. BLYP exchange and 
correlation functional [Bec88, LYP88] has been adopted, along with norm- 
conserving Martins-Troullier pseudopotentials [TM91] and Kleinman-Bylander 
decomposition. [KB82] Plane wave expansion has been truncated at 85 Ry 
while a 400 atomic units (a.u.) electronic fictitious mass has been set. The 
system has been simulated for about 26 picoseconds with a time-step of 4 

a.u. (~ 0.096 femtoseconds). The average temperature was 302 + 16 K. 
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Figure 8.2: (blue lines) g(r), h(r), g(0), h(@), (red line) n(r) for the N- - - D interac- 
tions; r is the N- - - D distance whereas @ is the N---D—O angle. 
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8.2 Simulation Analysis 
Structural and Electronic Properties 


Instead, in fig. 8.2 a and c are reported the pair radial and angular 
distribution functions g(r) and g(0), along with the integration number, 
n(r), for the N - - - water interaction. As expected, the nitrogen atom of thi- 
azole acts as H-bond acceptor, whereas no such feature has been observed 
for the S atom. This different behaviour of the two heteroatoms is further 


Figure 8.3: Contour plots of the ELF of thiazole interacting with water molecules 
from a snapshot of the CPMD simulation. 


illustrated by fig. 8.3, which shows ELF contour plots (see Section 3.2) 
near the solute molecule. The ELF isosurfaces show that a charge transfer 
along the N- - - D direction occurs, whereas no such thing is observable for 
the S atom. On the contrary, around the S atom there is a systematic void 
to the water molecules that to move away from it. 


Vibrational Properties 


It would be interesting to probe the normal mode associated to the N 
atom motions inside thiazole. Unfortunately, N atom is involved in sev- 
eral non-localised normal modes, thus to extract them from a simulation 
would be a very complex task [Key97]. 

Thus, to probe the system reaction to H-bonding, I choose to study 
the VDoS of the water molecules involved in H-bonds with thiazole. In 
fig. 8.4 the reconstructed VDoS of the simulation cell is reported in the 
range associated to OD stretches. As can be seen, the agreement with the 
experimental Raman spectra is very good even adopting FT. Nevertheless, 
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Figure 8.4: Comparison of experimental Raman spectrum of thiazole in D2O 
(red upper line) with vibrational density of states (lower blue line) calculated by 
FFT of the velocity autocorrelation function (obtained by the CPMD simulation) 
corrected by a uniform scaling factor of 1.05. 


to correlate vibrations and structural properties, I once again employed 
wavelets as shown in fig. 8.5. Water, like the diols of Chapter 6, shows the 


Figure 8.5: (upper panel) Maximum of the OD stretching band time-resolved fre- 
quencies of the water molecule H-bonded to thiazole; (lower panel) time distri- 
bution of the H-bond distance. In brown, the raw datasets; in black, the averaged 
quantities; and in blue, the 10'" degree polynomial fitting are reported. 


same correlation between H-bond length and vibrational frequency. This 
could be presented also in the form of the “banana shaped” spectrograms 
already seen in Chapter 6, as reported in fig. 8.6. However, in this case, 
the plot has been constructed weighting the VDoS of the O—D stretching 
mode by the FHB H-bond function (see Section 3.8), in order to sort out 
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Figure 8.6: Weighted probability distribution of the OD stretching mode as a 
function of the N- - - D inter-molecular length. 


the contribution due to water molecules interacting with each other. This 
trick was not required for diols. 


8.2.1 Discussion 


In this research the CPMD simulation analysis showed that thiazole mainly 
interacts with water through its nitrogen atom and not via the sulphur 
atom. I have probed the VDoS of water performing Fourier and wavelet 
transforms of the O—D time series, but this approach is slightly under- 
mined by the normal mode coupling between the two different OD groups 
present in a water molecule. In the next Chapter, a study of the dynam- 
ics and vibrational properties of water shall be again presented. Partially 
deuterated water (HOD) has been employed for the research presented in 
the next Chapter, where the O-D and O—H normal modes are signifi- 
cantly decoupled occurring in two very distinct spectral regions. 
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9. Bifurcated Hydrogen-Bond of Confined Water 


H-bond occurring in water is closely related to the most peculiar prop- 
erties of this solvent, and it has been the subject of many experimen- 
tal [ISS*06,Sti80,CS05,CBH*05,LRT06a,KCP*08,Ch008,PHMM08,RRTO09, 
BS10] and computational studies [OT9,CIKP03,KMM *04,PXV09,ZDG10]. 
As a matter of fact, the network of water molecules evolves in time ex- 
ploring different types of H-bonds during their formation and break- 
ing [LS03, LRTO6b, SL10]. In particular, Loparo et al. [LRTO6b] speculated 
about the mechanism of H-bond creation and breaking and proposed two 
alternative models to describe this phenomenon: in one model the switch- 
ing of H-bonding partners goes through a stable intermediate, otherwise 
through a transition state. 

This evolution, whatever the mechanism is, can be viewed as succes- 
sion of different transient structures occurring in the dynamics of water 
molecules; recently Laage and Hynes [LH06, LH08] proposed a bifurcated 
model to describe the transition state of H-bonded species as they switch 
from one H-bond acceptor to the other. 

This type of bond is deeply involved in the formation of biologi- 
cally relevant structures [ZM92], but, actually, in pure water bifurcated 
or strained H-bonds have extremely short lifetime (~ 100 femtoseconds 
as an order of magnitude), as shown by 2D IR studies [FEL*03, ELF*05, 
RRN*11]. Thus, it is impractical to study bifurcated H-bonds with ab initio 
MD simulations in liquid water. 

However, Pandelov et al. [PPWI09] recently showed that the geometry 
and dynamics of those peculiar structures are, instead, well accessible for 
water molecules confined in crystalline ionic hydrates. In fig. 9.1, Pande- 
lov et al. [PPWI09] reported the IR spectra of various hydrated salts, two 
of them (hydrated NaCl and LiCL) showing multiple structured bands 
in the OH stretching region. Werhahn et al. [WPXI11] reported a simi- 
lar spectrum shape for lithium nitrate trihydrate, as shown in fig. 9.2. 
The structured bands of the OH stretching modes are a clue that water 
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Figure 9.1: FTIR spectra of various aqueous solutions (see inset). The samples 
are measured in the liquid (275 K, red dashed lines) and in the solid phase (200 
K, blue solid lines). The figure has been taken from Ref. [PPWI09]. 
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Figure 9.2: IR spectra of LiNO3-(HDO)(D20)2. (a) The absorption spectrum of 
the liquid sample at 305 K (black line) is compared to that of the crystalline sam- 
ple at 220 K (green line). Three distinct peaks appear upon crystallisation, point- 
ing toward discrete, distinguishable H-bonding environments of the OH groups. 
(b) Comparison of the polarisation-resolved and IR spectra. The polarisation- 
resolved measurements resulted in parallel (blue line) and perpendicular (red 
line) signals. For the definition of the parallel and perpendicular planes of polar- 
isation, refer to [WPXI11] and the description of the used polarisation-resolving 
laser setup. The sum of these signals (green line) reproduces the IR spectrum 
quite accurately. The figure has been taken from Ref. [WPXI11]. 
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molecules are engaged in H-bonds of different strength and stability, thus 
hydrated NaCl, LiCL and LiNO; salts are good candidates to contain wa- 
ter involved in this elusive bifurcated H-bond. 

In fact, lithium nitrate trihydrate is an inorganic salt where the water 
molecules are forced by the lattice forces in a characteristic crystalline ar- 
rangement [HTO77,HTO80]. In this crystal, water is involved in three dif- 
ferent types of H-bonds [WPXI11], a strong one between water molecules, 
a weak one between water and nitrate anions, and a bifurcated one where 
the hydroxyl group of water is directed halfway between two N—O bonds 
of nitrate (see figures 9.3 and 9.4). Due to the crystalline nature of this 


a) b) 


Figure 9.3: (a) Crystal structure of LiNO3-(HDO)(D20)2. The crystal structure 
belongs to the orthorhombic group Cmcm with the OH groups of the water 
molecules all lying in planes perpendicular to each other. These planes are pro- 
jected in panels b and c for a better view. (b) Plane showing the bifurcated 
H-bond giving rise to the absorption line indicated in red in fig. 9.2, panel b. (c) 
Plane showing two types of hydrogen bonds: a weak bond to the oxygen atom of 
a nitrate anion, and a strong, ice-like bond between two water molecules, which 
results in the blue absorption spectrum in fig. 9.2, panel b. This figure has been 
taken from Ref. [WPXI11]. 


salt, about a third of water molecules is engaged in bifurcated H-bonds 
that persist for quite long time, in contrast to what happens in pure liq- 
uid water. Werhahn et al. [WPXI11] have recently characterised crystalline 
LiNO3 - (HOD)(D20)2 by means of IR spectroscopy and interpreted the 
OH stretching region of the IR spectrum as due to the superposition of 
the stretching bands of the hydroxyl groups engaged in strong, weak, and 
bifurcated H-bonds. 


As the bifurcated H-bonds lie in crystallographic planes orthogonal to 
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Figure 9.4: Simulation cell. Some selected H-bond interactions are represented 
with dotted blue lines. Water molecules and nitrate ions, whose interactions are 
not explicitly shown, are represented with translucent cyan and green colours, 
respectively. 


those where the water molecules form weak and strong H-bonds, Wer- 
hahn et al. [WPXI11] were able to discriminate the three strong peaks 
occurring in the OH stretching region from polarised IR spectra. More- 
over, they measured slightly different lifetimes for these types of H-bonds, 
ranging from ~ 1 to ~ 2 picoseconds. Fortunately, these lifetimes are ac- 
cessible to ab initio MD. 


9.1 Computational Details 


9.1.1 Ab Initio Molecular Dynamics Simulation 


In this study, my aim was to give a molecular level interpretation of the 
experimental findings very recently obtained by Werhahn et al. [WPXI11], 
adopting a computational approach based on Car-Parrinello simulations 
and WT. Making use of the CPMD code [CPM], I have simulated a LiNO3 
-(HOD)3 crystal composed by 48 water molecules, 16 Lit cations, and 
16 NO, anions in an orthorhombic box of 13.6036 x 12.7132 x 11.998 
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A? volume, with periodic boundary conditions. I chose the structural 
parameters reported in the supporting information of Ref. [WPXI11] as 
starting configuration for the simulation. In this lattice, two “families” of 
HOD molecules are present: 16 of them are engaged in two bifurcated 
H-bonds, whereas 32 molecules form simultaneously one weak and one 
strong H-bond. I had care to dispose the HOD molecules so to have an 
equal number of H-bonds formed along the O—H and the O—D direc- 
tions for every type of H-bonds (i.e. 16 weak, 16 strong, and 8 bifurcated 
H-bonds along either the O—H and the O—D directions). Moreover, HOD 
molecules are disposed to break the crystal symmetry, in order to avoid 
spectral contribution from higher K-points for a better comparison with 
the experimental data. Thus, I have 8 bifurcated H-bonds along O—H di- 
rection, 8 bifurcated H-bonds along O—D, 16 weak H-bonds along O—H, 
16 weak H-bonds along O—D, 16 strong H-bonds along O—H and 16 
strong H-bonds along O—D. HOD molecules were disposed in the crystal 
lattice so to break the crystal simmetry, in order to avoid spectral contribu- 
tion from higher k-points for a better comparison with the experimental 
data [WPXI11]. The BLYP [Bec88, LYP88] exchange-correlation functional 
was employed along with GTH pseudopotentials [GTH96], the wavefunc- 
tions were expanded in a plane wave basis truncated at energies of 85 Ry, 
while the fictitious electronic mass was 400 a.u. The equations of motion 
were integrated with a time-step of 4 a.u. (~ 0.096 femtoseconds) over 17.6 
picoseconds at temperature of 224 + 9 K in the microcanonical ensemble 
(NVE), after a thermalisation of ~ 3 picoseconds by velocity rescaling. 


9.1.2 Dipole Moment Calculation 


MIWF centres [MV97,SMVP98] were calculated every 128 a.u. ( 3.072 
femtoseconds) to obtain molecular dipole moments. Within the linear re- 
sponse theory (see Section 3.5.2), I have calculated the contribution to the 
IR absorption spectrum due to water molecules by Fourier transforming 
the molecular dipole time series according to the equation 3.24. 

Saving the MLWF centres at ~ 3 femtoseconds intervals for a ~ 17.6 
pico-second-long trajectory, I have a resolution of ~ 1.9 cm! and a maxi- 
mum accessible frequency of ~ 5500 cm! for the calculated spectra. The 
dipole moment has been projected along the unit vectors in order to en- 
hance the contribution to the spectrum due to OH/OD stretching mode. 
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9.2 Analysis of the Trajectory 


I calculated the IR spectrum of LiNO3-(HOD)3 by FT of the dipole mo- 
ment of the whole simulation cell. The result is presented in fig. 9.5 (up- 
per panel); in the lower panel I report a magnification of the OD and OH 
stretching region, in order to sort out the high intensity peaks due to lat- 
tice modes and those at 1100—1500 cm~!, which are due to the vibrations 
of nitrate anions [WS61,SCWJ66, WSKO1]. These latter modes are clearly 
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Figure 9.5: Panel a: IR spectrum of the simulation cell ; panel b: magnification of 
the 2000-4000 cm! region. Frequencies are uniformly scaled by a factor of ~ 
1.027. 


separated from the rest of the spectrum, but their analysis is difficult due 
to the congestion of spikes and peaks that make it difficult to extract the 
actual band shapes. Therefore, I calculated the IR spectrum for O-H 
and O—D stretching adopting MLWF centres, as explained in the previ- 
ous section: projecting the dipole moment along the O-H and O—D unit 
vectors is equivalent to the well established one-phonon approximation 
(see for example Cardini et al. [COK85]), which is adopted here to dis- 
criminate and highlight the contributions due to a specific normal mode 
(the OH or OD stretching mode). The experimental spectrum [WPXI11] 
for LiNO3-(HOD)(D2O)2 is reported in panel a of fig. 9.6: Werhahn et al. 
assigned the peak centered at ~ 3380 cm! to the OH stretching mode of 
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water molecules forming strong H-bonds, the intermediate one at ~ 3480 
cm! to the OH stretching of molecules forming bifurcated H-bonds and 
that at ~ 3535 cm! to the OH stretching of molecules that form weak 
H-bonds. As the weakly and strongly interacting OH groups lie on the 
same crystallographic plane, orthogonal to that of the bifurcated H-bonds, 
they measured the contributions of weakly and strongly bonded water 
molecules in parallel polarised IR spectra, while that due to bifurcated 
hydrogen-bonded water molecules appears only in the perpendicular po- 
larisation. 
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Figure 9.6: Panel a: experimental IR spectra [WPXI11] of LINO3 - (HOD)(D20)2 
for perpendicular (blue line) and parallel (red line) polarisation, and their sum 
(green line); panel b and c: calculated IR spectra of LiNO3 - (HOD) 3 for wa- 
ter molecules interacting with weak and strong H-bonds (blue lines), bifurcated 
bonds (red lines) and their sum (green lines) in the spectral region of both OH 
(panel b) and OD (panel c) stretching modes. Frequencies are uniformly scaled 
by a factor of ~ 1.027. 
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The same separation of the peaks is obtained in our calculated spec- 
tra (shown in panels band of fig. 9.6 for O-H and O—D stretchings, 
respectively): in fact, we have calculated the dipole moment time series 
for each water molecule, and grouped the spectral contribution according 
to the strength of the H-bond: every molecule forming a strong H-bond 
with one of its hydroxyl groups has the other OH group engaged in a 
weak H-bond. These two contributions then appear together in the cal- 
culated spectrum (blue lines in panels b and c of fig. 9.6), whereas the 
contribution of the water molecules forming bifurcated H-bond appears 
separated (red line in panels b and c of fig. 9.6). Our results fully confirm 
the assignment [WPXI11] of the three spectral components for both OH 
and OD stretching modes, in order of increasing frequency, to strong, bi- 
furcated and weak H-bondings. Also the relative bandwidths agree with 
the experiment, the central peak being substantially broader (both for OH 
and OD stretching modes) then the lateral ones. Werhahn et al. [WPXI11] 
tentatively attributed the large bandwidth (80 cm!) of the central peak 
to librations of bifurcated H-bonded water molecules in the plane defined 
by the two nearest NO, anions. From static ab initio calculations they 
obtained a potential energy profile along the librational coordinate with 
1 kcal/mol deep well centred at 0 degree. I extracted the distribution of 
this angle from our simulated CPMD trajectory: the result is summarised 
in fig. 9.7. The main feature reported by Werhahn et al. [WPXI11] is recov- 
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Figure 9.7: Libration angle distribution for every water molecule interacting with 
bifurcated H-bonds along the simulation run. 
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ered, but the maxima of the probability distribution are centered at + 8 
degree. I noticed that this value agrees with that obtained by Ramasesha 
et al. [RRN*11] from simulations and 2D IR experiments on pure water. 
This could suggest that water molecules involved in bifurcated H-bonds 
at finite temperature oscillate between two slightly displaced positions, 
symmetrically closer to one of the two nitrate groups. 

ELFs (see Section 3.2) have been calculated and projected onto two nor- 
mal planes of the lithium nitrate trihydrate lattice, as reported in fig. 9.8 
for the optimised structure [WPXI11]: weak, strong (A panel) and bifur- 
cated (B panel) hydrogen-bonds are displayed. For the strong interaction, 
a larger charge transfer occurs, as can be appreciated from the fact that the 
ELF isosurfaces of strongly bonded water pairs overlap slightly, whereas 
for weak and bifurcated interactions there is a clear separation. 


A 


Figure 9.8: (A) ELF projected onto the plane containing weak and strong H-bonds. 
(B) ELF projected onto the plane containing bifurcated H-bonds. 


The analysis of the dipole moment dynamics by means of WT is of help 
in the interpretation of the strong and broad band attributed to bifurcated 
H-bond. Exploiting the wavelet ability to localise a signal both in time and 
in frequency, I can correlate in time, with steps of ~ 3 femtoseconds, the 
most intense frequency of the IR wavelet spectrum (calculated replacing 
FT with WT in equation 6.13) with the H- - - O inter-molecular distance of 
the two nearest neighbours of each water molecule, as illustrated in fig. 
9.9. The result of this analysis is reported in panels a and b of fig. 9.10, 
where the correlation is displayed as two-dimensional contour plot. 
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Figure 9.9: In the wavelet spectrograms, the OH stretching frequency has been 
correlated with the instantaneous inter-molecular distances between the OH 
group and its two nearest oxygen atoms. 


In previous works, wavelets have been employed to recover from MD 
simulation the inter-molecular structure-frequency correlation of the O-D/O—H 
stretching band [MRH04, LS03, RRT09] for water [MSC08a, MSCO8b] and 
glycols, leading to “banana-like” plots, whose global shape is not very 
sensitive to the specific molecular system. 


Conversely, since I included both first and second neighbours (in view 
of the ambiguity of such definition for the water molecules interacting 
with bifurcated H-bonds) in the present analysis major differences appear 
between weak and strong (panel a) and bifurcated (panel b) hydrogen- 
bonding interactions. The Wavelet plot of the formers clearly shows two 
separated spatial regions: hydrogen-bonded molecules are within the first 
2.4 A, where the IR activity thickens at frequencies of around 3540 and 
3380 cm! due to weak and strong H-bonds, respectively; a second re- 
gion at distances larger than 2.4 Ä contains slightly broader peaks at 
the same frequencies, but no overlap between the IR activities of the two 
zones occurs. On the other hand, Wavelet analysis shows a definite con- 
tinuous pathway between the two regions for the bifurcated-bonded wa- 
ter molecules, corroborating the assumption that the IR profile of those 
molecules is modulated in time not by one of the two H---Oxo; dis- 
tances, but by both of them. These graphs can be seen as an evident 
“signature” of bifurcated hydrogen-bonding. 


Experiments of Loparo et al. [LRTO6b] are consistent with the assump- 
tion that bifurcated H-bond configurations are visited only transiently 
during the switch of bonding acceptor. Our calculated angle distribution 
(see fig. 9.7) supports this finding. Moreover, with WT I calculated the 
distribution of the OH stretching frequency as a function of the H-bond 
length. Our calculation (see fig. 9.10) confirms the hypothesis of Loparo 


192 


Francesco Muniz Miranda 


a) b) 

3600 - J On x 
$ 06 >~ 
\ c 
E 3500 | J o £ 
s 
~ 2 
o 0.4 5 
© - 
2 3400 - 4 =) 2 
5 03 3 
E = 
% 3300 - | 02 F 
3 2 
0.1 g 
a 

3200 0.0 

1.5 2.0 25 3.0 15 2.0 2.5 3.0 
c) d) 

3800 1st neighbor | 2nd neighbor i oa 
3 
x [=] 
E 3500 | J 04 6 
[2] =] 
= 
= 2 
o 0.3 E 
2 A J 4 
2 3400 2 
2 02 S 
; 
en ] o g 
` ° 
o 
a 

3200 T T T T T T 0.0 

1.5 2.0 25 3.0 15 2.0 2.5 3.0 
H ... O distance / A H ... O distance / A 


Figure 9.10: Distribution of the most intense frequency of the IR wavelet spec- 
trum with the change of the two shortest H- - - O inter-molecular distances; panel 
a: weakly and strongly hydrogen-bonded water molecules; panel b: bifurcated 
hydrogen-bonded water molecules. Panels c and d show the same distribution 
reported in panel b splitted between the first and second neighbours. Wavelets 
plots are calculated with o = 247r. Frequencies are uniformly scaled by a factor 
of ~ 1.027. 


et al. [LRTO6b], if the probability distribution of frequencies is adopted 
in place of the free-energy profile and if the H---O distance replaces the 
fictitious “H-bond coordinate”. In panels c and d of fig. 9.10, the same 
distribution of probability is reported for the nearest (panel c) and sec- 
ond nearest neighbour (panel d) separately, in order to avoid cancellation 
effects in the plot relative to the bifurcated interaction due to the two 
nearest neighbours simultaneously. In the present case the shape of the 
two distributions is less extended along the H. - -O coordinate, since the 
mobility of molecules in a crystalline salt is much less than in a liquid. 
Moreover, the value adopted for the 7 parameter (2477) in fig. 9.10 is op- 
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timal for frequency localisation, but has poor time-localisation accuracy 
due to a Heisenberg-like uncertainty [Chu92]. Therefore, I set 7 at a much 
lower value (7t) to improve time-accuracy, as reported in fig. 9.11: with 
this choice the relationship between the time-dependent distances and 
the time-dependent frequency is more apparent, albeit with larger spread 
along the frequency axis due to time-frequency uncertainty. In particular, 
the “banana-shape” of the distribution [MSC08b, MSC08a] is largely re- 
covered in bifurcated-bonded molecules for the interaction with the first 
neighbour (panel c), while for the interaction with the second neighbour 
(panel b) the change of frequency with the distance appears to be the 
opposite. In fact, when the second neighbour moves away from the OH 
group, the first neighbour comes closer and the hydroxy] stretching mode 
shifts to lower frequencies. This proves that the two structure-frequency 
distributions are actually anti-correlated, an effect that in previous studies 
in liquid solutions [MSC08b, MSC08b] was not observable, due to the mo- 
bility of the first and (in particular) of the second solvation shells. Loparo 
et al. [LRTO6b] proposed two different schematic free-energy surfaces that 
describe the changes of OH stretching frequency in liquid water during 
the switch of H-bond partners along a fictitious “H-bond coordinate”, as 
reported in fig. 9.12. However, their plot reported in fig. 9.12 are pure 
speculation, it is just an artifact figure to illustrate some general concept, 
whereas my contour plot of fig. 9.11 comes from wavelet analysis of an ab 
initio MD trajectory. 


The lifetimes of the three types of H-bond have been obtained calcu- 
lating the autocorrelation functions adopting the procedure of Pagliai et 
al. [PCRSO3]. These correlation functions do not show a single exponential 
decay. The integral lifetimes, albeit with a significant uncertainty, range in 
the 1.6 — 2.1 picoseconds interval, which is of the same order of magni- 
tude of those reported by Werhahn et al. [WPXI11]. 


Differences between weak/strong hydrogen-bonded molecules and bi- 
furcated ones do not occur only in structural and vibrational properties, 
but also in the electronic structure, as analysed by MLWF centres. As 
a matter of fact, dipole moment distribution for the two families of wa- 
ter molecules lead to two different profiles, as reported in fig. 9.13: the 
distribution of the 32 weakly and strongly hydrogen-bonded molecules 
display a Gaussian-shaped profile centred at ~ 3 D, a value coincident 
with that previously calculated by Silvestrelli and Parrinello [SP99] and 
by Gubskaya and Kusalik [GK02] in liquid water. On the other hand, the 
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Figure 9.11: Distribution of the most intense frequency of the IR Wavelet spec- 
trum with the change of the two shortest H- - - O inter-molecular distances; panel 
a: weakly and strongly hydrogen-bonded water molecules; panel b: bifurcated 
hydrogen-bonded water molecules. Panels c and d show the same distribution 
reported in panel b splitted between the first and second neighbours. Wavelets 
plots are calculated with o = rt. Frequencies are uniformly scaled by a factor of 
~ 1.027. 
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Figure 9.12: OH stretching frequency distribution spreaded along the fictitious 
“H-bond switching /breaking coordinate”. Figure is taken from Ref. [LRTO6b]. 
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Figure 9.13: Dipole moments distribution for the water molecules. Blue and red 
histograms correspond to the distribution of the weakly/strongly and bifurcated 


hydrogen-bonded molecules, respectively. Both histograms are normalised to 
100% . 


population of the remaining 16 bifurcated hydrogen-bonded molecules 
shows a bimodal profile of the molecular dipole moments distribution. 

I tentatively associated the two maxima occurring in the red distri- 
bution of fig. 9.13 to the instantaneous presence of distorted bifurcated 
H-bonds, two of the four H-- - Oyo; distances being significantly longer 


(ie. > 0.5 Ä) than the others, and we attribute the central minimum to a 
perfectly bifurcated arrangement. 
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II 


Concluding Remarks 


The main leitmotif of this dissertation is hydrogen-bonding, which I 
have studied in some molecular systems adopting molecular dynamics 
and wavelet analysis. 


In the case of the dipeptides in a phospholipid bilayer (see Chapter 5), 
the analysis of the hydrogen-bonding interaction between C=O groups 
and water provided relevant insights about the position of the dipeptides 
into the membrane, filling some interpretative voids left unanswered even 
by 2D infrared experiments. In fact, the information obtained by means 
of 2D infrared experiments often needs complement from calculations. In 
particular, the plausibility of some structural deductions usually requires 
molecular dynamics simulations to be substantiated. While the study of 
systems made of tens of thousands atoms is challenging, the accuracy in 
spectrum reconstruction offered by classical molecular dynamics does not 
allow elucidating many vibrational features. Moreover, a lot of modern 
implementations of molecular dynamics, in order to better sample the 
phase-space, is nowadays completely disconnected from the very idea of a 
true time-resolved trajectory (e.g. the “replica exchange method”), which, 
on the other hand, is pivotal in vibrational analysis of molecular dynamics 
datasets. As a matter of fact, a frequency analysis of the simulation cannot 
be directly performed without a true trajectory, and the only other viable 
option left is to extract some configuration and then perform harmonic 
static ab initio calculations on these configurations. 


Therefore, I mainly adopted ab initio molecular dynamics to study 
smaller systems at higher level of accuracy. 

I have employed this approach to study diols in water (see Chapter 
6) and their vibrational response changing in solvation. A correlation 
between the instantaneous vibrational frequency and change of the envi- 
ronment was found, allowing to explain the inhomogeneous broadening 
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of the O—D stretching mode directly from simulated data. This correla- 
tion was previously assessed only adopting mixed techniques at different 
levels of theory (i.e. classical MD simulation followed by ab initio static 
computations on selected configurations, as hinted before) and for much 
simpler systems, made up of relatively few atoms. 


While these results were consistent and encouraging, I then searched 
for amore direct comparison with experimental data, simulating propane- 
diol in acetonitrile. In fact, with this choice of solvent the vibrational 
signals of solute (the O—D stretching mode of propanediol) and solvent 
(acetonitrile) do not overlap anymore. Adopting wavelets and ab initio 
molecular dynamics I found an intriguing structural feature, an intra- 
molecular hydrogen-bond of propanediol, which was only observed and 
studied before in the gas phase with IR experiments and some static ab 
initio calculations. It was surprising that this feature could manifest it- 
self in a condensed phase simulation, also because I did not observe it in 
the aqueous-phase simulation of propanediol, thus suggesting that, while 
in water the intra-molecular hydrogen-bond is not competitive compared 
with solvation, this could be true in acetonitrile. Static ab initio calcu- 
lations subsequently proved this, giving comparable results for the free 
energy of clusters of propanediol and acetonitrile with and without this 
intra-molecular feature. This predicted hydrogen-bond was also tested by 
new infrared experiments in condensed phase, which actually supported 
the computational finding, hence strengthening the common view of com- 
putational methods as reliable “in silico experiments”. 


In this system I have also tried to recover another and much more sub- 
tle vibrational effect, the blueshift of the C=N stretching band, which is 
due to hydrogen-bonding. This was a challenging task, because the usual 
effect of hydrogen-bonding is to redshift the vibrational frequencies of the 
involved groups. It was not taken for granted that a density functional 
theory approach, as implemented in ab initio molecular dynamics, was 
able to reproduce this effects. And, to tell the truth, first results seemed to 
highlight a minor shortcoming (after all, the blueshift is of the order of few 
wavenumbers) of this computational method. In fact, static ab initio calcu- 
lations were able to recover the blueshift, suggesting that this issue with 
the ab initio molecular dynamics simulation was not due to some “theo- 
retical” shortcomings (e.g. the adopted functional), but probably caused 


200 


Francesco Muniz Miranda 


by statistical limitations. And this was indeed the case, since, performing 
a new simulation at a lower temperature, the effect was largely recovered. 
Albeit this approach is slightly unorthodox, a similar method has been 
recently exploited by Nicolini et al. [NC09, NFC11] to perform “smarter” 
simulations. In the specific case of propanediol in acetonitrile, I resorted 
to this “trick” in order to “slow down” the dynamics, thus isolating the ef- 
fect of hydrogen-bonding onto the C=N stretch and sorting out this latter 


from all other environment rearrangements that could induce frequency 
shifts. 


Wavelets were also pivotal into elucidating the behaviour of methyl 
acetate in methanol and water (see Chapter 7), allowing the extraction of 
new interesting information even from old ab initio molecular dynamics 
calculations. In fact, while Fourier transforms could not even reproduce 
the frequency doublet of the C=O stretching band, the vibrational spec- 
trum as obtained by wavelet analysis showed it very clearly. This repre- 
sents a relevant increase in the ability of simulations to reproduce spectro- 
scopic features with wavenumber accuracy. Moreover, wavelets allowed 
us to associate each frequency component of the doublet to a specific inter- 
molecular situation. In this way, a complete assignment of the two peaks 
of the vibrational doublet was possible, obtaining a computational un- 
derstanding of the effects of the “chemical exchange” between different 
hydrogen-bonding states onto the vibrational spectrum, and assessing a 
one-to-one correspondence between peaks and structural realisations. 


Finally, I have shown how a combined approach of computational 
methods provides new significant insights on a somewhat elusive topic 
such as bifurcated hydrogen-bonding (see Chapter 9) in water, which 
is relevant because it is seen as an intermediate-step of hydrogen-bond 
breaking and formation in aqueous environment. Therefore, it is also piv- 
otal to understand how water switches between hydrogen-bonding part- 
ners. The infrared spectrum was reproduced with surprising accuracy, 
distinguishing the hydroxyl stretching of the water molecules engaged in 
bifurcated or “normal” hydrogen-bonds. 


Furthermore, wavelets allowed to extract a contour-map that basically 
spreads the infrared activity along the hydrogen-bond switching coordi- 
nate, providing more solid computational basis to one of the two compet- 
itive models proposed by Loparo et. al. [LRTO6b]. In fact, these calcula- 
tions anchored one of this two models to a real physical coordinate, at the 
same time reproducing the spectroscopic and structural features in excel- 
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lent agreement with the experimental data. Thus, even in this case (all 
other simulated systems of this dissertation were liquids) wavelet analysis 
makes possible to extract correlations between the structural and spectro- 
scopic properties, providing a better understanding of the physical origins 
of the experimental findings. 

Moreover, this approach also provides a peculiar “signature” of the 
bifurcated interaction, allowing (in principle) to recover it also in other 
molecular systems when and where it should arise. As a matter of fact, 
this computational approach based on time-frequency localisation can be 
easily generalised, and thus could be applied to pinpoint the contribution 
of bifurcated interactions to the vibrational spectra of different molecular 
systems, even when characterised by faster hydrogen-bonding dynamics, 
as in liquid water. 


In conclusion, I would like to point out the effectiveness of the com- 
bined use of high-level-of-theory molecular dynamics and wavelet trans- 
forms to monitor vibrational properties, in order to understand how these 
latter are affected and modulated by the environment. I have focused my 
research mainly to hydrogen-bonded systems because hydrogen-bond is 
the strongest of the inter-molecular interactions, but it is my hope that 
in future even weaker interactions could be realistically modelled and 
probed by this type of calculations. The continuous research in extending 
the density functional formalism to Van der Waals forces (see Ref. [SC11]), 
as well as the recent corrections due to Grimme (see Appendix C.4 and 
Ref. [Gri06]) for ab initio molecular dynamics, suggest optimism. 

Besides, the ever-expanding size of the molecular systems that can be 
studied by ab initio molecular dynamics, leads to hope that in future also 
the vibrational properties of large biomolecules could be computationally 
reproduced and fully elucidated. The combined approaches described 
and adopted in this dissertation represent a suitable mean to achieve this 
goal. 

Besides, while this dissertation is mainly devoted to hydrogen-bonding 
(due to its ability to shift vibrational frequencies up to hundreds of wavenum- 
bers), the possibility to reproduce the doublet in the vibrational spectrum 
of the C=O stretching of methyl acetate (with a splitting of just ~ 20 
cm!) suggests that weaker interactions could be probed with wavelet 
analysis too. To corroborate this hope, the finding that even the small 
C=N blueshift in acetonitrile (less than 10 cm!) can in principle be re- 
covered, which hints that a complete picture of the molecular behaviour 
and vibrations shall be at hand. 
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Appendixes 


A. Some DFT Details 


A.1 Proof of Hohenberg-Kohn Theorems 


The aim of this Appendix is to demonstrate that p(r) determines the nucleus-electron 
potential VN? save for an additive constant. Demonstrating this, is equivalent to demon- 
strating that there is a bijective correspondence between p(r) and VN“. 

The demonstration will goes on by ab absurdo reasoning, so let’s just assume that 
to two different potentials V° and V’N® correspond the same density p(r) for the non- 
degenerate ground state of the system. If this stands true, then we should have two 
different electronic Hamiltonians H® and H’* with different eigenstates |p) and |g’), such 
that they give the same total density p. If we apply the variational principle (eqn. 2.7) 
adopting |9’) as a test-ket for the H Hamiltonian, we obtain that 


Eo < (g/|Alg’) = (AI) + AH) = Eo + (pA Allg). (A) 
With the same reasoning and adopting the test-ket |p} and the Hamiltonian Ñ’ we obtain 
Eo < (g|H’l) = (pile) + (PH — Ale) = Eo + (lH’— Ale) . (A) 

Moreover, calculating (g'|H — H’|p') = (g'|VN° — V’Ne|’) we get 


(9 |A- H'|\g') = (o'|V^e — v'Neig') = / ery re VN?) o/ dr = 


= [ag [Vn - V'NeJar = | p(a)[VN* - VN dr = - (p| f - Ale) 
Thus, the two equations A.1 and A.2 become 


Eo < E) + / p(t) [VNe — v'Nejar (A3) 


and 
E} < Eo — J p(r)[vNe — v'Nelar . (AA) 


If we add eqn. A.4 to eqn. A.3 we obtain the following inequality 
Eo+Eg<EgQ+Eo - (A.5) 


But, if Ey + EQ = Ef + Ep was true, then the two states |p} and |g’) were energy degen- 
erate, violating one of the hypothesis. Besides, inequality Eg + Eb < Ef + Eo is a clear 
algebric contradiction. This demonstrates that there are not two different potential VN® 
and V’N® able to produce the same ground-state density p(r): from this follows that the 
correspondence between p(r) and VN? has to be bijective. This is the first Hohenberg- 
Kohn theorem. 
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The second Hohenberg-Kohn theorem is just an application of the variational theorem 
to the energy of the system written as a functional of the electronic density of the ground- 
state: 


E = (glAlp) = Elo] = Tp] + Vlo] + VNe[p] = Fule] + f ptdVNar ; 


here FAK = Tefp] + V“[p] is called “Hohenberg-Kohn functional” and does not depend 
on the specific electronic system, since it is just the sum of electronic kinetic energy (T°[p]) 
and electronic potential energy (V“[p). It is often said that FF is a “universal functional”, 
because it does not depend on the molecular geometry (i.e. the nuclear coordinates {Ry }) 
The second Hohenberg-Kohn theorem states that 


With a test electronic density o(r), provided that 6(r) > 0 and f 6(r)dr = n, 
we have that 
Eo < Ep] (A.6) 


This theorem follows immediately noticing that 


EI] = (@IAIG) = | BE)VNedr + FAK (p > Elp] = Eo 


A.2 Discussion about Kohn-Sham Equations 
In this Appendix a way to obtain the Kohn-Sham Equations (eqn. 2.21) is presented. 


following the treatment of Parr and Yang [PY89]. Equation 2.18 can be written in terms of 
single-electron wavefunctions as 


Ell = Lf e@(-3v gi()dr+ | drol yee en Parar + Esl 
i 
plr) = We [p(x x2; Xn) |?ds1dx2...dxy = Y |i (x)|? 
j 
P(X1,X2, we Xn) = L det [pgg] 
yn! 
Single-electron states |$') are, by a hypothesis, a complete basis, i.e. they should span 
the entire Hilbert space accessible to the system. Thus, if |p') are a complete basis, then 


they can always be arranged as an orthonormal basis with some unitary transformation, 
to obtain the orthonormality condition 


(Gilg!) = [de 
Now a new functional Y of single-electron states {|¢')} can be defined: 


Y{|¢')] = = E| [pl - ze pilo’) 


To minimise the E[p] functional with respect to the |$') single-electron states, then the 
following condition must hold 


dY(I$")] = 
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which is the analogous of eqn. 2.14. Adopting the variational calculus, these equations 
lead to 


a E. ; | 
esp") = 3% a al P) = Lele 
J 


Here he — |- AZ + ver] is the single-electron Kohn-Sham Hamiltonian, but the equa- 
tion still does not appear like eqn. 2.21. However, since the Kohn-Sham Hamiltonian is 
a self-adjunct operator, the matrix of its eigenvalues can always be made diagonal by a 


unitary transformation that leads us to eqn. 2.21. 


A.3 Exchange-Correlation Functionals 


The simpler functional: LDA 


The most crude approximation done in DFT is the so-called local density approximation 
(LDA). This approximation is rather inadequate to treat complex molecular systems, but it 
easily shows how the DFT equations work in practice. Within LDA one assumes that near 
r the function p(r) does not change significantly, so that one could write down Vp(r) = 0 
and 


ELPA = f p(r)elPAlplar , 


with eLPAfo] = eLP4 [p] + eLPA[p] being the exchange-correlation energy of a single elec- 
tron in a constant-density gas. The energy eLP 4 is assumed to be the sum of an exchange 


energy derived from Thomas-Fermi-Dirac theory 


PAi = -F (Žo) 


and a correlation energy that is obtained by previous computer simulations (mainly Quan- 
tum Monte-Carlo). The resulting exchange-correlation potential is 


ELPA eLPA 
yLDA — XG — LDA xc 
A = ry = ee Lol + OS , 


which, when put into the equation 2.21, gives the LDA Kohn-Sham equation 
1 tj aeLDA ; ; 
f- SVE UNCED f PE an + A =. (A 
p 1] 


It has to be noticed that the LDA functional, albeit simple and inaccurate for molecular 
systems, nevertheless should be considered fully ab initio. In fact, the more complex 
functional discussed in the following are both more complex and accurate, yet they contain 
some part of empiricism. 


BLYP, a GGA Functional 
To deal with real quantum chemical problems, nowadays LDA is discarded and new DFT 


methods are employed. DFT methods that take into account the first derivative of the 
electron density (i.e., such that Vye = Vxelp(r), Vo(r)]) are called Generalized Gradient 
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Approximation (GGA) methods. The first corrective term to the exchange energy is usally 
of the type: 
> 00,5 u | Ve(r) |? 
zg 87) TOE dr j 
where the gradient of the density is present. An analogous formula exists for the corre- 
lation energy. One of the first and most widely used functional of this type was the B88 
functional proposed by Becke [Bec88] as exchange correction to the LDA functional: 


E888 = ELDA + AEPS 


2 
AEBS — 1/3 X 
Pp 1+ 66x sinh! x 
| Ve | 
= ; 
p4/3 


here the 6 parameter is obtained by fitting of rare gas data. Just this simple correction 
improves the LDA results of two order of magnitude. A widely adopted correlation cor- 
rection is included into the LYP functional, proposed by Lee, Yang and Parr [LYP88], 
containing four parameters (a,b,c and d) extrapolated from Helium atom data fitting: 


LYP _ pte p P- 2/3 +)8/3 18/3 
E = KaT abw { T [144(2 NCEE IA ATS] 
+(47 — 78) | Vp |? = (45 = ô)(| Vet |? +| Ver |?) 
+2071 (11 -8)(e* | Vot ? +07 | Ve~ 23] } (A8) 
| _ ep[-cp!/3] 
w J3 + dp 173) (A.9) 
do` 1/3 
_ -1/3 e 
ô = co + Trap ’ (A.10) 


with pt and p” densities of the spin states. In fact, there is no correlation if all spin 
vectors are parallel to each others. The B88 and LYP functionals are often used together 
and, when this happens, it is usual to refer to this pair as “BLYP” functional. The BLYP 
functional is the more adopted functional in this dissertation, for a number of reasons, the 
main ones being the good accuracy in reproducing static and dynamic properties, and the 
fact that it is implemented inside the CPMD suite of programs. 


An hybrid Functional: B3LYP 


Basically, hybrid functionals (sometimes called Hyper-GGA) are GGA functional where 
the exact Hartree-Fock exchange term is added as a corrective term. The HF exchange 
integral has the form: 


1 ‘ 
Kj = f | HB (x1) — 9} 02)" dd. (A.11) 


Since hybrid functionals mix together HF and DFT features, the reason of their name 
is obvious. One of the most employed hybrid functionals is the so-called B3 exchange 
[Bec93b, Bec93a], which is dependent from three parameters a,b and c extrapolated by 
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data-fitting, and which is often combined with the LYP correlation to obtain the B3LYP 
functional: 


EROP — (1 — a) EZP4 + a EZ? + BAERS + (1—c)EPPA + CE? ; 


here with the symbol BBE I mean the total exchange energy of the HF method, which is 
defined as a sum of the n double-electron integrals K;;: 


=- IK, 


i j<i 


The exact HF exchange energy is scaled by a factor a: this is due to the fact that an exact 
exchange long range energy (i.e. EFF) is not compensated by an approximated long range 
correlation (i.e. E-Y?) as the theory requires (see [ea01]). 

Unfortunately, the B3LYP is not currently integrated into the CPMD package, nor 
other Hybrids, because it is difficult (both at the practical and theoretical level) to imple- 
ment HF exchange with plane waves as basis set. Anyway, the B3LYP functional, due to 
its effectiveness in reproducing many molecular properties, has risen to a position of “ref- 
erence DFT calculation” and in the present dissertation is often used in static calculations 
in order to check the quality of BLYP calculations. 

I have also to pinpoint that calculations with LDA or a GGA functional are much 
faster that analogous calculations carried out with an hybrid one. This is due to the fact 
that the HF exchange (eqn. A.11) is non-local, because cannot be written as an integral of 


the type 
| / BAG Gee (A.12) 
J. r12 

due to the presence of the mixed terms $;(x1)7 (x) and $* (x2)! (x2) that prevent a 


simple reduction to single-electron densities (x1) and p(x) The integral in eqn.A.12 is 
the Coulomb integral (see for example ref. [Bec14], eqn. (16)). 
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B. Some MD Details 


B.1 Liouville Operator and Integration Algorithms 


The time-rate of a function A = A(q(t), p(t)) can be put as 


dA $N dA ðA 2NdA0H JAH 


ee ee (B.1 
dt 24 agi | Pap, 2 əq' op; pi Oq' l 
It can be created a Liouville operator iL defined as (see [PMM* df]): 
. 3N 9 a 3N 
IL = Lids at Pigg, = LFV tM = Va tPL)Ve ; (B.2) 
i l i 


the operator is self-adjunct and symbols Vq and Vp refer to the coordinate and momenta 
divergence, respectively. With the help of this operator, eqn. B.1 can be rewritten as 


dA 
— =iLA . B. 
g (B.3) 
This first order differential equation can be easily integrated to obtain 
A(q,p) =e" A(q(0),p(0)) . (B.4) 
It can be defined a new operator 
elt -= a(t) , (B.5) 


which is called “propagator” of the dynamics, because when acts on A(q(0),p(0)) it 
propagates the A function in time to obtain A(q(t), p(t)) due to eqn. B.4: 


A 5) _ ‚iLt (4) u (a) 
U t)- =e . = 
o- (ao p(0)) ~ (pin) 
Furthermore, it is a unitary operator, since we have 
unü()* — eilt ,—ilLt — L(t) =ð =i , 


with I representing the identity operator. This is an important property that provides that 
the equations of motion solved with such propagator are time-reversible as required by 
theory. In turn, time reversible integrators allow the conservation of energy, which usually 
drifts when approached with other algorithms. The Liouville operator can be thought as 
composed of two sub-operators, one acting on the coordinates, iL,(t) = q(t)Vq, and the 
other on the momenta, iL,(t) = p(t)Vp: 


iL(t) = iLg(t) + iLp (t) 
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using eqn. B.4, the action of ilL (t) can be underlined: 


(iILgt)? 
2! 


A(q(t), p(t)) = A(q(0), p(0)) + iL gtA(q(0), p(0)) + 


= LO)! A(g(0 yay vk A(q(0),p(0)) = 


= A(q(0) + #q(0 ),p(0 )) 


It can be appreciated that the elat operator has caused a time-shift of the coordinates. 
In the same way it can be demonstrated that operator Lrt can induce a time-shift on 
momenta. Thus, the combined effect of these operator is a time-shift of the entire state of 
the system and they can be applied to generate a continuous dynamics. However, the true 
propagator is eilt given in eqn. B.5, and this propagator has both the addends iL,(0) and 
iL,(0). Unfortunately, iL,(0) and ilL,(0) do not commute with each other 


ei, (0 )+iL,(0 ) e14 (0) IL, (0) 


This problem has been tackled by Trotter [Tro59] who proved the following theorem for 


two non-commuting operators a and £: 
ett = By (B.6) 


g 
F 


B 
lim (eetet) = lim (erete 


T>+00 T>+00 
Therefore, when T — +œ the propagator can be partitioned in two independent operators 
whose application is relatively easy. Obviously the limit T — +00 cannot be reached 
and this is a spring of numerical error, but it is unavoidable in the context of numerical 
simulations. Defining 


mE t 
— At 

a _ iLyt ; 

— = — =Atp(0)V 

T = p(0) p 
and Bi 

ao = Atg(0)V 

eo F ġ(0) q 
and exploiting Trotter’s formula, the propagator becomes 

U(t) = gilt = lim el, 7 t pil gAt iLp F ‚ (B.7) 
At-30,t-++00 


where At is the finite time-step adopted to integrate the equations of motion. Obviously, 
analytically solving the molecular classical equations of motion is not possible, and the 
time range is discretised in time-steps; during each time-step dynamical variables are 
assumed to be constant. To finally deduce the integration algorithm is useful to analyse 
how acts 


e iL, * 7 + pil, At oily T 
on the (classical) initial state vector 
ts) 
p(0) 
We find that: 
oily Y pillgAt ily , (ao) = ( q(0) + Atq($) ) = 
p(0) p(0) + S [p(0) + p(At)| 
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_ (2 + Atq(0) O) 
p(0) + Z[p(0) + p(At)] 
Thus, the so-called “velocity verlet” algorithm at every time-step At updates the state 
vector as 


Piltn) — Piltnzı) = piltn + At) = pi(tn) + > [pi(tn) T Piltn+1)] (B.8) 
i i i i i ; (At)? 
q'(tn) — q'(tn41) = F (tn + At) = q' (tn) + q'(tn)AE+ Pin) Z— (B.9) 


adopting the approximated propagator 


eLp% pil At pil $ 
It is immediate to check than equations B.8 and B.9 correspond to equations 2.34 and 2.36. 

Verlet-based algorithms are not the best ones to guarantee that energy and other 
relevant constant of motion are conserved from one time-step to the subsequent: as a 
matter of fact, “predictor-corrector” algorithms can succeed much better in this specific 
task and generate a trajectory that is much closer to the real one. The point is that verlet- 
type integration schemes display a better behaviour during the overall simulation, with 
small drifts in conserved quantities: this quality is often made explicit by saying that 
“verlets are less accurate but more robust”. Moreover, as pinpointed in Ref. [Mar04], pag. 
372, “the errors do not accumulate” since they are prone to erase each others. 


B.2 Replica Exchange 


In MD the mean quantity (0) is calculated as a time-average. On the contrary, in Monte- 
Carlo simulations (MC) the systems explores different geometric configurations not inter- 
connected by a true time-dependent trajectory, so the mean quantity (0) is computed as 
a true state-average. The replica-exchange (REM) [SO99] method aims to retain the best 
of both worlds and to speed up the exploration of the phase space, which is a critical 
requirement to investigate large molecular systems. Basically, with REM many different 
simulations are performed concurrently on the same system, but at different temperatures: 
they are called replicas. The dynamics evolves according to normal MD in all replicas, but 
from time to time a MC move is attempted to switch configurations between replicas 
according to the Metropolis acceptance rule. 

Let’s follow the dynamics of the 300 K replica, for example, and imagine that after 
some MD steps configuration are switched with the 500 K replica. When the switching 
occurs, the velocities of the 500 K replica are rescaled to match the 300 K requirement, and 
vice versa. If we want to calculate properties at 300 K, we have to focus just onto the 300 K 
replica event after the switching with the 500 K replica. 

It has been demonstrated that this approach is able to speed up computations, albeit 
forcing to perform many different simulations. A slightly different REM scheme, called 
“hamiltonian-REM” [FWT02], requires that the various replicas do not differ in tempera- 
ture but in the Hamiltonian: within this approach, the classical potentials of the different 
replicas are scaled by a factor that weaken or strengthen interactions. This is similar to 
“normal” REM, since the dynamical effects of rising temperatures (i.e. kinetic energy) or 
decreasing potential energy is about the same. Anyway, hamiltonian-REM allows turning 
off specific interactions selectively (e.g. intra-molecular ones, solute-solvent ...) that REM 
does not. 
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The drawback of this technique is that REM-MD cannot be used to study time- 
dependent phenomena, since the MC-moves disrupt the time-sequence trajectory. 
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C. Some Ab Initio MD Details 


C.1 Hellman-Feynmann Theorem 


In quantum mechanics every observable variable is represented by a self-adjunct operator 
acting on a suitable Hilbert space. So, it appears that there is no space for the classical 
notion of force. However, the classical concept of “force” can be introduced into a quantum 
theoretical framework by the Hellman-Feynmann theorem that will be reported in this 
Appendix. 

I call H(A) the Hamiltonian operator of the system that happens to depend on the A 
parameter. |i£) is autoket of H(A) so that H(A)|e) = E(A)|pe). Then 


(pelA(A)|pe) = E(A) pele) = EA) 
Taking the derivative of E(A) with respect of A we obtain: 


EN _ È (ye lA(a)lpe) = 
ð 


= (2(#el) Aye) + (el (ZAA) |e) + lA (Flv) = 


= (30 E(W rl) |e) + E(pe| (sive) 
= (BD + £3 (ele) = (BO) , ca) 


where we have exploited the fact that 2 J (We|Pe) = 0 since the state |e) does not depend 


on A. The equality REA) = (2H (A)) can be related to classical “forces” if we adopt 
nuclear coordinates {Ry} as parameter for the Hamiltonian in eqn. 2.5, obtaining the 
following relation 


Vr„E(Rn) = (Vrs A (Ry)) = (pe |VRy A (Rw)| pE) 


But from the classical mechanics, we know that for conservative forces we have (eqn. 2.32) 


so that we can define the forces acting on the atomic nuclei as 
Fy = VryE(Rn) = (pe |Vrs A (Ry)| pE) , (C.2) 


w is the Hellmann-Feynman theorem. 
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C.2 Pulay Forces and BSSE 


Eqn. C.2 is the theoretical pillar of much ab initio MD and requires correct eigenfunc- 
tions to calculate reliable semi-classical forces. Nevertheless, SCF methods (like DFT) are 
more accurate in calculating eigenvalues than eigenfunctions. Moreover, force calculations 
based on eqn. C.2 are affected by another problem: we assumed that electronic eigenfunc- 
tions of H® were independent from {Ry} parameters, but this fact is only superficially 
true. 

In fact, electronic autokets are built as linear combinations of a vector basis |k) that, in 
real computer calculations, often depend on the nuclear coordinates. The Gaussian bases 
(see Appendix 2.1.5) in static ab initio calculations are usually constructed around nuclear 
position to better describe the static features of molecules. They are called localised bases 
for this reason. In static calculations localised bases do not create serious problems: the 
error induced by localised bases, called “basis set superposition error” (BSSE), can be ef- 
fectively tackled with some computational tricks, for example adopting the “counterpoise” 
procedure included into the Gaussian03 and Gaussian09 packages [FTSta, FTS*b]. 

However, adopting these Gaussian localised basis to describe molecules that actually 
move and interact with others, like in ab initio MD, the position changes introduce serious 
errors in the force calculations at every time-step. Thus, performing ab initio MD with 
localised bases, the forces are cursed by so-called “Pulay forces” [RP95] that are spurious. 
There is no currently available way to erase these spurious forces. 

This is one of the reason why CPMD employ non-localised plane waves that spans all 
the available space, which moreover are orthogonal by construction. 


C.3 Pseudopotentials 


Pseudopotentials are employed in CPMD in order to minimise the number of plane waves 
used to describe electrons. Since core electrons would require a huge amount of pane 
waves to correctly reproduce fast oscillations in their corresponding wavefunctions, pseu- 
dopotentials effectively replace core-electrons eigenvalues. Pseudopotentials have to cor- 
rectly simulate the limit behaviour of the core electrons wavefunctions where these ap- 
proach a certain cut-off radius. Inside this cut-off, the pseudopotential and the wave- 
functions should be slow oscillating to reduce the number of plane waves to employ. 
Moreover, it is an important requirement that the pseudopotential have to be transferable, 
which means that pseudopotentials could be adopted in calculations for different chemical 
environments producing results of the same accuracy. 

Usually pseudopotentials are categorised as “norm conserving” or “ultrasoft”. A 
norm conserving pseudopotential (e.g. see Ref. [TM91] and [GTH96]) is constructed to 
reproduce the valence properties of all electron-calculation and produce norm conserv- 
ing pseudo-wavefunctions that satisfy the usual orthonormality condition. To achieve 
this requirement, usually norm conserving pseudo-wavefunctions tend to retain a certain 
amount of oscillating behaviour. Ultrasoft pseudopotentials, on the other hand, create 
pseudo-wavefunctions that are as “smooth” as possible to minimise the range in Fourier 
space needed to accurately describe valence properties. Ultrasoft pseudopotentials have 
not been used in this dissertation since they do not allow to calculate with the desired 
accuracy some structural and spectroscopic properties (such as g(r), IR intensities and 
MIWF centres). 
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C.4 Van der Waals Forces 


As shown in Section 2.2.5, classical MD uses force fields containing Lennard-Jones type 
potentials to describe weak interactions such as van der Waals forces. Thus, in the stan- 
dard CPMD approach, van der Waals forces are almost entirely lost due to the shortcom- 
ings of actual implemented exchange-correlation functionals (see Appendix A.3). Stefan 
Grimme [Gri06] proposed a semi-empirical correction EGrimme to the DFT energy in order 
to retain dispersive effects: 

Etot = Eper + Ecrimme 


For a pair of atoms j and i, their interaction energy calculated within the Grimme approach 


takes the form z 
NA cl faump(Rji) 


6 
Rj 


EGrimme ii 
j=l i=14+j 


where c’ is the dispersion coefficient for the ji pair of atoms calculated as 
d= vcdi , 


N is the total number of atoms and f4ump is a dumping function used to avoid singularity 
when the interatomic distance Rji is small, defined as 
Samp Ri) = — 
d ii) = 7 
ump\ `i 1 + ef (Rji/Rji)—1 


with R; being the sum of the van der Waals radii for atoms i and j. 
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